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This  study  has  addressed  the  use  of  built-in-test  (BIT)  for  fault 
detection.  Isolation,  and  repair  in  the  Military  Computer  Family  (MCF) . 
Particular  emphasis  was  given  to  the  identification  and  assessment  of  BIT 
techniques  applicable  to  the  MCF-AN/UYK-41 (V) . The  MCF-AN/UYK-41(V)  is 
software  compatible  with  Digital  Equipment  Corporation's  PDP-11/70. 

The  approach  taken  in  this  study  was  to  assume  a fault  population, 
predict  where  in  the  system  these  faults  are  most  likely  to  occur  and  — d 


ATION  OF  TNU-P 


develop  a rationale  for  deploying  built-in  fault  detection  and 
localization  resources  accordingly.  The  fault  population  assumed 
included  both  stuck-at  and  transient  faults.  It  was  determined 
using  a failure  prediction  program  based  on  MIL-HDBK-217B  that 
for  related  computers,  it  is  likely  that  60%  of  all  faults  will 
occur  in  memory,  30%  will  be  in  the  CPU  and  the  remainder  will 
happen  throughout  the  rest  of  the  computer  Including  the  power 
supply.  X* 

A unified  built-in-test  approach  with  appropriate  Inter-level 
and  user  communications  was  identified  which  provides  fault 
detection  coverage  at  the  module,  chassis,  and  system  levels.  The 
required  BIT  resources  were  characterized  at  each  hierarchical  level 
with  the  intent  of  serving  as  the  basis  for  modifying  the  MCF-AN/UYK-41 (V) 
form,  fit,  and  function  specifications  to  Include  built-in-test.  Using 
state-of-the-art  digital  hardware  including  microprocessors,  it  was 
predicted  that  with  10  to  30%  additional  BIT  hardware,  approximately 
80-95%  of  all  assumed  faults  can  be  detected.  To  satisfy  the  MCF 
objective  of  minimizing  false  module  pulls  and  in  keeping  with  the 
proposed  MCF  two-level  maintenance  philosophy  on-site,  off-line 
module/chassis  testing  should  be  considered  as  a testing  adjunct  to  BIT. 
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1.0  INTRODUCTION 


The  Army,  through  the  Center  for  Tactical  Computer  Systems  (CENTACS), 
Office  of  the  Communications  Research  and  Development  Command  (CORADCOM), 
is  working  cooperatively  with  the  Navy  and  the  Air  Force  on  a new  approach 
to  developing  and  acquiring  computers  for  the  military  [l]-[4].  The  effort 
is  known  as  the  Military  Computer  Family  (MCF)  Program.  The  Military  Com- 
puter Family  Program  goal  is  to  provide  defense  system  developers  with  a 
software-compatible  family  of  standard,  modular  computers.  The  MCF  program 
stresses  software  compatibility  between  prior  generation  computers  at  the 
level  a programmer  needs  to  know  to  write  time- independent  machine  language 
programs.  At  the  same  time,  the  MCF  concept  calls  for  hardware  compatibil- 
ity through  standardized  Form,  Fit  and  Function  (F^)  specifications. 

A consequence  of  being  able  to  use  existing  software  and  state-of-the- 
art  modular  hardware  is  the  potential  reduction  in  total  computer  system 
life  cycle  cost  (LCC).  An  Important  aspect  of  the  proposed  MCF  procurement 
procedure  with  potential  LCC  savings  is  the  concept  of  hardware  vendor  war- 


ranties. The  MCF  hardware  warranty  concept  is  viewed  as  a means  of  reduc- 
ing logistic  support  costs  through  improved  reliability.  Implicit  in  this 
approach  is  the  necessity  of  1)  knowing  with  a high  degree  of  confidence 
that  a module  is  not  performing  properly,  2)  identifying  which  module  Is 
faulty,  and  3)  effecting  repair  through  module  replacement.  To  meet  this 
need,  an  effective  and  efficient  means  of  detecting  and  locating  hardware 
faults  is  necessary. 

The  present  study  has  addressed  the  use  of  built-in- test  (BIT),  couch- 
ed in  a unified  testing  framework,  as  a means  of  monitoring  system  perform- 


ance and  aiding  in  fault  detection  and  Isolation.  Built-in-test  as  defined 
in  this  study  refers  to  computer  hardware,  firmware  and  software  resources 
which  exist  for  the  purpose  of  performance  monitoring  and  fault  detection 
and  isolation. 

The  nature  of  built-in-test  Is  such  that  It  must  be  considered  verrf 
early  in  the  system  conceptual  phase.  Further,  it  Is  essential  that  a 
unified  approach  to  testing  be  established  in  order  to  Insure  maximum  fault 
detection  coverage  at  minimum  additional  BIT  hardware  and  software  cost. 
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This,  in  turn,  must  be  done  within  an  overall  maintenance  framework  which 
makes  sense  in  view  of  the  plans  and  objectives  of  MCF. 

1.1  General  Discussion  of  Study  Objectives  ( 

The  Military  Computer  Family  program  addresses  the  need  within  DOD  for 
new  generation  digital  hardware  while  maintaining  software  compatibility 
with  prior  generation  machines.  In  addition,  the  MCF  concept  goes  beyond 
the  transportability  of  software  from  old  machines  to  new.  Under  the  MCF 
program,  a whole  new  procurement  process  is  made  possible  through  modular 
hardware  computer  structures.  In  particular,  the  modular  hardware  struc- 
ture of  MCF  allows  components  to  be  procured  simultaneously  from  multiple 
sources. 

In  order  to  insure  the  success  of  this  competitive  procurement  process 
and  to  motivate  vendors  to  produce  reliable  form,  fit  and  function  compat- 
ible designs,  MCF  components  will  be  procured  with  vendor-backed  warran- 
ties. The  MCF  warranty  concept  provides  incentives  for  vendors  to  produce 
reliable  components.  At  the  same  time,  however,  it  places  the  burden  of 
Identifying  faulty  modules  with  a very  high  degree  of  confidence  on  the 
government  in  order  for  warranties  to  be  exercised.  It  is,  therefore, 
essential  that  available  means  be  provided  for  identifying  and  locating 
faulty  MCF  components. 

An  additional  aspect  of  the  MCF  warranty  concept  is  the  necessity  of 
measuring  component  on-time  if  the  components  are  to  be  warrantied  for  a 
pre-detcrmlned  number  of  hours.  A later  section  of  this  report  deals  with 
the  elapsed  time  measurement  question  which,  in  addition  to  being  important 
for  warranty  validation,  may  be  useful  in  fault  prediction  and  localiza- 
tion. 

Finally,  It  Is  vital  for  MCF  users  to  know  system  operating  capabili- 
ties and  limitations  at  all  times.  On-line  performance  monitoring  is  of 
utmost  importance  to  field  commanders  and  others  who  rely  on  computer  sys- 
tems to  provide  them  with  combat-critical  Information.  It  is,  therefore,  a 
primary  objective  of  BIT  for  MCF  to  meet  this  requirement  through  timely 
fault  detection  and  alerting. 
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In  summary,  the  objectives  of  BIT  for  MCF  are: 

1.  To  provide  continuous  system  monitoring  and  indication  of  system 
111a  1 function. 

2.  To  diagnose  the  cause  of  system  malfunction  to  a module  level  with 
a low  probability  of  a false  module  pull,  and 

3.  To  measure  and  record  module  elapsed  power-on  time. 

The  importance  of  these  objectives  and  their  impact  on  the  recommended 
BIT  approach  for  MCF  will  become  evident  in  later  sections  of  this  report. 

1 .2  Scope 

The  scope  of  this  study  includes  the  identification  and  evaluation  of 
a recommended  built-in-test  approach  for  the  AN/UYK-41( V)  member  of  the 
Military  Computer  Family.*  The  study  scope  encompasses  the  conceptualiza- 
tion of  an  on-line  maintenance  approach  for  MCF  and  a detailed  understand- 
ing of  BIT  techniques  particularly  suited  to  the  AN/UYK-41(V)  member  and 
its  constituent  chassis  and  modules.  Included  in  the  proposed  BIT  approach 
is  the  identification  and  assessment  of  suitable  on-line  and  idle  time  test 
techniques  which  are  compatible  with  system  level  fault  diagnostics  and 
user  interfaces.  The  present  study  results  will  be  used  as  the  technical 
basis  for  modifying  the  AN/UYK-41 ( V ) form,  fit  and  function  specifications 
to  insure  the  incorporation  of  the  recommended  BIT  approach.  This  study 
does  not  include  consideration  of  fault  tolerant  design  approaches  but 
rather  addresses  the  issues  specifically  related  to  fault  detection  isola- 
tion and  repair  through  module  replacement. 

1.3  Study  Approach 

The  approach  taken  in  this  study  was  as  follows: 

1.  Review  AN/UYK-41( V)  form,  fit  and  function  specifications.  Review 
previous  module,  chassis  and  system  level  BIT  approaches. 

2.  Based  on  BIT  cost/performance  guidelines  supplied  by  the  MCF/BIT 
selection  committee  working  group,  select  candidate  BIT  approaches 
for  the  detection  and  identification  of  failed  modules. 


The  AN/UYK-41(V)  and  AN/GYQ-21(V)  nomenclatures  will  be  used  interchange- 
ably throughout  this  report.  Both  refer  to  the  MCF  version  of  the 
PDP-11/70. 
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3.  Present  descriptions  of  the  candidate  BIT  approaches  to  the 
MCF/BIT  selection  committee. 

4.  Determine  relative  cost/performance  of  the  candidate  BIT 
approaches.  Identify  areas  where  the  target  BIT  performance 
criteria  can  be  modified  to  result  in  improved  BIT  performance 
and/or  lower  BIT  cost. 

5.  Select  a recommended  BIT  approach  for  the  MCF  computer  system 
AN/UYK-41( V) . 

b.  Present  the  recommended  approach  and  supporting  rationale  to 
the  MCF/BIT  selection  committee  for  evaluation. 

An  essential  part  of  the  present  study  was  to  understand  the  results 
of  previous  related  studies.  In  particular.  It  was  necessary  to  review  the 
AN/UYK-41( V)  form,  fit  and  function  specifications  to  understand  the 
alternatives  and  limitations  for  BIT  in  this  member  of  the  Military 
Computer  Family.  At  the  same  time,  documentation  on  computers  similar  to 
the  AN/UYK-4H  V)  was  reviewed.  Obviously,  the  DEC  PDP-11/70  was  of  major 
interest  because  of  the  software  architectual  similarities.  Also,  the 
AN/AYK-14( V)  was  relevant  for  two  reasons,  namely,  the  AN/UYK-41(V)  and 
AN/AYK-14C V)  bus  similarities. 

Also  of  interest  were  machines  with  extensive  on-line  fault  detection, 
localization  and,  in  some  instances,  recovery  capabilities.  The  following 
machines  in  this  category  were  considered: 

1.  Self-Testing  and  Repairing  (STAR)  Computer-JPL  [5] 

2.  Bell  System  1A  Processor  (No.  4 ESS)  [6] 

3.  DEC  PDP-11/60  [7] 

Information  on  these  machines  were  reviewed  in  order  to  maximize  the  trans- 
fer of  useful  fault  detection  and  isolation  features  to  the  new  generation 
machine. 

In  order  to  facilitate  the  objective  evaluation  of  candidate  built-in- 
test  approaches,  it  was  essential  that  a set  of  BIT  evaluation  parameters 
be  established  to  serve  as  a basis  upon  which  to  select  candidate  BIT 
approaches. 


Obviously  the  impact  of  candidate  BIT  approaches  on  MCF  hardware  and  soft- 
ware design  time  must  be  considered.  For  purposes  of  this  study  It  will  be 
assumed  that,  unless  otherwise  noted,  these  cost  factors  are  proportional 
to  the  percentage  increase  in  hardware  and  software  required  to  support  the 
recommended  BIT  approach. 

The  study  approach  called  for  the  identification  of  cost/performance 
guidelines  followed  by  the  selection  of  candidate  BIT  approaches.  The 
selection  of  candidate  BIT  approaches  was  based,  in  part,  on  the  analysis 
which  depicts  where  failures  are  most  likely  to  occur  in  computers  with 
architectures  similar  to  the  AN/UYK-41( V) . An  important  consideration  in 
this  study  in  addition  to  fault  detection  was  fault  communication  alterna- 
tives for  different  BIT  approaches  including  redundancy  in  the  fault 
reporting  process. 

Based  upon  all  of  these  factors,  candidate  approaches  were  rank- 
ordered  and  a particular  approach  selected.  The  recommended  BIT  approach 
for  MCF  is  discussed  in  detail  in  the  following  sections  of  this  report. 

1 .4  Report  Organization 

The  organization  of  this  report  reflects  the  study  methodology  in  that 
background  information  about  MCF  in  general  and  the  MCF-AN/UYK-41(V)  in 
particular,  is  presented  in  Sections  1.0  and  2.0.  Section  1.0  reviews  the 
general  goals  and  objectives  of  MCF  as  they  relate  to  built-in-test. 

Section  2.0  provides  an  in-depth  description  of  the  AN/UYK-41(V)  Including 
system  specifications  and  physical  characteristics.  The  remaining  portion 
of  Section  2.0  discusses  the  MCF  two-level  maintenance  concept  and  Its  im- 
pact on  BIT.  Included  In  this  section  is  a discussion  of  the  assumed  MCF 
fault  population,  BIT  performance  and  cost  measures  of  interest. 

Section  3.0  presents  a review  of  related  prior  generation  computer 
fault  detection  and  isolation  approaches.  Because  of  the  special  re- 
lationship of  the  MCF-AN/UYK-41{V)  to  the  Digital  Equipment  Corporation 
PDP-11/7U,  it  is  important  to  understand  DEC'S  hardware  and  software 
maintenance  approach.  By  the  same  token,  it  is  important  to  understand  the 
built-in-test  incorporated  in  the  AN/AYK-14(V)  because  of  the  similarity  in 


5 


the  AN/UYK-41(V)  and  the  AN/AYK-14(V)  bus  structures.  Two  fault  tolerant 
computers  were  reviewed  to  ascertain  the  relevance  of  their  fault  detection 
rational  to  MCF.  Finally,  the  relatively  recently  introduced  DEC  PDP-11/60 
maintenance  approach  was  considered  because  of  its  architectural  similari- 
ties to  the  PDP-11/70  and  its  provisions  for  self-test  through  the  plug-in 
Diagnostic  Control  Store  (DCS)  board.  ' 

Section  4.0  provides  the  overall  framework  for  distributing  BIT  re- 
sources throughout  the  MCF  AN/UYK-41(V) . Included  also  is  a description  of 
fault  reporting  requirements  with  appropriate  user  interfaces. 

In  order  to  provide  a quantitative  basis  for  allocating  BIT  resources 
In  the  MCF-AN/UYK-41( V) , a failure  rate  analysis  of  the  PDP-11/7U  was  made. 
The  results  of  this  analysis  is  described  in  Section  5.0.  Conclusions  con- 
cerning where  failures  are  most  likely  to  occur  are  made  based  upon  this 
analysis  as  well  as  other  similar  machines. 

Section  6.0  contains  specific  recommendations  for  BIT  at  the  module 
level.  Included  are  recommendations  for  analog,  memory,  CPU,  and  I/O 
modules  plus  preliminary  recommended  approaches  for  testing  the  AN/UYK-41 
(V)  BUS. 

Section  7.0  discusses  the  module/chassis  elapsed  time  measurement 
problem.  Two  distinct  approaches  for  determining  elapsed  time  are  pre^ 

sen ted. 

In  Section  8.0  an  analysis  of  the  performance  and  cost  measures  Is 
done  to  evaluate  the  effectiveness  of  the  built-in-test  recommended  in 
Sections  4.0  and  6.0. 

Section  9.0  concludes  the  report  with  a summary  and  specific  recommen- 
dations for  further  study*  This  section  Identifies  areas  where  additional 
work  Is  needed. 


2.U  GENERAL  SYSTEM  DESCRIPTION  AND  BUILT-IN-TEST  CONSIDERATIONS  FOR  THE 

MCF  AN/UYK-41( V)  COMPUTER  SYSTEM 

2.1  MCF  AN/UYK-41(V)  System  Description 

2.1.1  Summary  of  the  System  Specifications 

The  Form,  Fit,  and  Function  (F^)  Specification  for  the  MCF 
AN/UYK-41(V)  [AN/GY0-2K V) ] computer  system  have  been  specified  by  ITEK 
Corporation.  The  chart  in  Figure  2.1  describes  the  structure  of  the 
various  ITEK  documents  which  will  be  referenced  throughout  this  report. 

The  AN/UYK-41(V)  is  a modular  computer  system  which  may  be  configured 
in  various  ways  to  meet  the  requirements  of  a large  number  of  applications. 
There  are  basically  two  types  of  systems  possible.  A single  processor  sys- 
tem and  a dual  processor  system.  The  characteristics  of  these  two  types  of 
systems  are  given  in  the  Tables  2.1  and  2.2,  respectively. 

The  architecture  and  the  instruction  set  of  the  AN/UYK-41 ( V ) have  been 
designed  such  that  it  may  be  used  to  emulate  the  Digital  Equipment  Corpora- 
tion's PDP-11/7U  computer. 


2.1.2  System  Configuration 

The  AN/UYK-41 ( V ) computer  system(s)  consists  of  multiple  chassis.  The 
single  processor  system  consists  of  one  Main  Computer  Chassis  No.  1,  one 
Memory  Expansion  Chassis,  and  one  I/O  Expansion  Chassis  as  shown  in  the 
Figure  2.2.  The  dual  processor  system  consists  of  one  Main  Computer 
Chassis  No.  2,  two  Memory  Expansion  Chassis,  and  two  I/O  Expansion  Chassis 
as  shown  in  the  Figure  2.3. 

These  chassis  are  interconnected  via  interface  cables  and  mounted  in 
rack  assembly  or  other  suitable  structure  per  system  application  require- 
ments. 

The  hardware  for  the  entire  system  is  partitioned  into  pluggable  mod- 
ules. These  modules  are  then  used  as  standard  building  blocks  to  configure 
functionally  large  or  small  computers  depending  upon  the  application.  The 
types  of  standard  modules  available  for  the  AN/UYK-41( V)  computer  system 
are  listed  in  Table  2.3  along  with  a brief  description  of  their  functions. 

Each  chassis  within  an  AN/UYK-41 ( V ) computer  system  will  carry  its  own 
power  supply  for  all  modules  contained  therein.  Each  chassis  will  also 
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TABLE  ?.l.  MCF  AN/GYQ-21 (V)  SINGLE  PROCESSOR 
COMPUTER  SYSTEMS  CHARACTERISTICS  [8] 


i 


Configuration 

* 

System  Parameters 

Minimum  System 

Maximum  System 

Typical  System 

Size 

1,  type  III 
chassis 

17,  type  III 
chassis 

3,  type  III 
chassis 

Height  (excluding 
cables) 

80  pounds  max 

1360  pounds  max 

240  pounds  max 

Power  dissipation 
(excluding  I/O 
expand  chassis, 
data  comm  chassis 
and  fan(s)) 

• S00  w max  - 
option  1* 
e 48S  w max  - 
option  2* 

• 2200  w max  - 
option  1 
e 28S0  w max  - 
' option  2 

e 930  w max  • 
option  1 
e 960  w max  - 
option  2 

Memory  capacity 

64 K words 

2000K  words 

2S6K  words 

I/O  capacity 

• 4 MCF  MO  chan 

• 84  MCF  I/O  chan 

e UNIBUS® 

• 32  data  comm 

channels 

e 4 MCF  M0  chan 

• UNI  BUS® 

• UNIBUS® 
e 16  data  comm 
channels 

Instruction  execu- 
tion throughput 

e SSO  KOPS  min  - 
option  1 

e 700  KOPS  min  - 
option  2 

a SSO  KOPS  min  - 
option  1 

• 700  KOPS  min  - 
option  2 

e SSO  KOPS  min  - 
option  1 

a 700  KOPS  min  - 
\ option  2 

Reliability  (ex- 
cluding I/O  ex- 
pand chassis, 
data  comm  chassis 
and  fan(s)) 

e 1800  hrs  min  - 
option  1 

e 1600  hrs  min  - 
option  2 

e 240  hrs  min  - 
option  1 
a 180  hrs  min  - 
option  2 

e 87S  hrs  min  - 
option  1 
e 700  hrs  min  - 
option  2 

* Option  1 - non-volatile  main  memory 
Option  2 - volatile  main  memory 

(r) UN I BUS  is  a registered  trademark  of  the  Digital  Equipment  Corporation, 
Maynard,  Massachusetts 
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TABLE  2.2.  MCF  AN/GYQ-21(V)  DUAL  PROCESSOR 
COMPUTER  SYSTEMS  CHARACTERISTICS  [8] 


V 
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Configuration 

System  Parameters 

Minimum  System 

Maximum  System 

Typical  System 

Size 

S 

type  III 

33,  type  III 

7 

type  III 

chassis 

chassis 

chassis 

Weight  (excluding 

400  pounds  max 

2640  pounds  max 

560  pounds  max 

cables) 

Power  dissipation 

• 

1S30  w max  - 

• 

4000  w max  - 

o 

1530  w max  - 

(excluding  I/O  ex- 

option  1* 

option  1 

option  1 

pand  chassis,  data 

• 

1540  w max  - 

• 

5250  w max  - 

• 

1540  w max  - 

comm  chassis,  and-. 
fan(s)) 

option  2* 

option  2 

option  2 

Memory  capacity 

128K  words 

4000K  words 

784K  words  j 

I/O  capacity 

• 

• 

20  MCF  I/O  chan 

2 UNIBUS® 

• 

• 

160  MCF  I/O  chan 

16  UNI  BUS''-' 

• 

• 

20  MCF  I/O  chan 

2 UNIBUS® 

• 

64  data  comm 

• 

32  data  comm 

channels 

channels 

Instruction  execu- 

• 

1000  K0PS  min  - 

• 

1000  KOPS  min  - 

• 

1000  KOPS  min  - 

tion  throughput 

option  1 

option  1 

option  1 

• 

1200  K0PS  min  - 

• 

1200  KOPS  min  - 

• 

1200-  KOPS  min  - 

option  2 

option  2 

option  2 

Reliability  (ex- 

• 

540  hrs  min  - 

• 

130  hrs  min  - 

• 

S40  hrs  min  - 

eluding  I/O  ex- 

option  1 

option  1 

option  1 

pand  chassis,  data 

• 

440  hrs  min  - 

• 

100  hrs  min  • 

• 

440  hrs  rain  - 

comm  chassis,  and 
fan(s)) 

option  2 

option  2 



option  2 

* Option  1 - Non-volatile  main  memory 
Option  2 - volatile  main  memory 

UNI BUS  is  a registered  trademark  of  the  Digital  Equipment  Corporation, 
Maynard,  Massachusetts 
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Figure  2.2.  AN/GYQ-21(V)  Main  Computer  Chassis  No.  1 
Expansion  Capability  [9] 
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TABLE  2.3.  MCF  AN/UYK-41(V)  HARDWARE 


Module 

Name 

Function 

CPU3 

Central  Processing 
Unit  No.  3 

32  bit  processor  to  implement 
A.!/UYK-41( V ) instruction  set,  and 
provide  MCF  Bus  control. 

NRAM 

Non-volatile  Random 
Access  Memory 

32K  or  64K  words  of  non-volatile, 
random  access,  read/write  memory. 

VRAM 

Volatile  Random 

Access  Memory 

32K  or  64K  words  of  volatile  random 
access,  read/write  memory. 

PCM2 

Power  Conversion 
Module  No.  2 

Regulated  dc  power  supplies  operating 
from  single  phase  ac  power  to  provide 
+5V,  -12V,  and  +15V  outputs. 

BEM 

Bus  Extender  Module 

Interface  drivers/receivers  with 
bidirectional  control  for  extending 
MCF  buses  to  provide  interchassis 
communications. 

BIM2 

Bus  Interface  Module 
No.  2 

Interface  between  an  MCF  member's 
busing  system  and  an  AN/UYK-4 1 ( V ) 

I/O  busing  system  (DEC  PDP-11/70 
UNIBUS). 

MCM3 

Memory  Control 

Module  No.  3 

Four-port  memory  access  arbiter 
module. 

I/O  MOD 

I/O  Modules 

Specific  I/O  interfaces  and  I/O 
controllers.  Specifications  are  yet 
to  be  defined. 
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contain  the  appropriate  backplane  connectors  for  the  modules  to  be  plugged 
in  that  chassis.  The  typical  complement  of  modules  to  be  used  with  the 
Main  Computer  Chassis  1 and  2,  the  Memory  Expansion  Chassis  and  the  I/O 
Expansion  Chassis  are  shown  in  Figure  2.4,  2.5,  2.6,  and  2.7,  respectively. 

2.1.3  Physical  Characteristics 

All  chassis  will  exist  in  one  of  three  configurations  (Type  I,  II  and 
III),  each  having  the  same  cross-section  and  varying  in  length  depending 
on  functional  complexity  and  growth  provisions.  The  standard  dimensions 
are:  height  7.62  inches,  width  10.12  inches,  and  length  12.56  or  15.56  or 
19.56  inches.  Each  chassis  will  provide  mechanical  support  and  cooling  to 
the  MCF  modules.  Cooling  provisions  include  externally  supplied  cooling 
air  directed  through  plenums  and/or  conduction  to  auxiliary  heat  ex- 
changers. Each  chassis  consists  of  an  enclosure,  backplane  interconnect  and 
input/output  connectors  as  defined  by  the  specific  chassis  requirements. 

The  modules  are  the  lowest  level  replaceable  units  of  the  Military 
Computer  Family.  All  modules  will  have  the  same  cross-section  (height  6.0 
inches,  width  9.0  inches)  and  will  vary  in  depth  depending  on  the  func- 
tional complexity  (standard  depths  0.5,  1.0,  1.5,  3.0,  3.5  or  5.0  Inches). 
All  modules  are  conduction  cooled  and  all  modules  except  the  power  supply 
modules,  employ  NAFI  type  connectors  (76,  152,  or  304  pins)  for  electrical 
interface  with  the  chassis. 

2.2  Reliability  and  Maintainability  of  MCF  Computer  Systems 


2.2.1  Environment 

The  MCF  computer  systems  are  expected  to  operate  reliably  under  a wide 
range  of  environmental  conditions.  The  details  of  environmental  specifica- 
tions under  operating  and  non-operating  conditions  for  the  chassis,  as  well 
as  the  modules,  are  given  in  the  ITEK  specifications  [13],  [14],  These 
include  mechanical,  electrical  and  thermal  specifications.  These 
environmental  conditions  have  been  considered  when  performing  reliability 
calculations  as  stated  below. 
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2.2.2  Reliability 

The  reliability  of  hardware  components  are  measured  in  terms  of  the 
mean  time  between  failure  (MTBF).  The  MTBF  calculations  are  done  assuming 
a constant  failure  rate  for  all  components  except  those  with  a known  limit- 
ed life.  The  failure  rates  are  derived  from  MIL-HDBK-217.  Those  compo- 
nents for  which  failure  rate  is  not  listed  in  MIL-HDBK-217  have  been  as- 
signed an  estimated  failure  rate.  Unless  otherwise  indicated,  the  MTBF 
calculations  have  been  performed  for  the  equipment  operated  in  a mobile 
ground  application  in  an  ambient  temperature  of  71°C  for  the  chassis  and  a 
ramp  clamp  temperature  of  85°C  for  the  modules. 

The  range  of  MTBF 1 s for  the  modules  vary  considerably  from  6,0UU  hours 
for  the  CPU3  module  to  40,000  hours  for  the  BEM  module. 

2.2.3  Maintenance  Philosophy 

It  is  assumed,  for  the  purposes  of  recommending  built-in-test  for  the 
MCF  computer  systems,  that  the  maintenance  on  these  computers  is  to  be  pro- 
vided with  a two-level  service  structure.  This  two-level  maintenance 
structure  is  shown  in  Figure  2.8.  The  maintenance  will  be  done  under  a 
module  warranty  concept. 

The  lowest  replaceable  unit  in  the  Military  Computer  Family  is  the 
module.  In  the  field  the  malfunction  in  the  computer  system  will  be  de- 
tected and  Isolated  to  one  or  more  faulty  modules.  These  faulty  modules 
will  be  replaced  in  the  field  by  good  spare  modules.  The  mean  time  to 
repair  (MTTR)  on  the  MCF  chassis  will  be  less  than  3U  minutes  [13],  MTTR 
Is  defined  as  the  time  required  to  isolate  a fault  to  a module  and  replace 
that  module  and  return  the  chassis  to  operational  status.  The  fault  iso- 
lation to  the  module  shall  be  accomplished  by  built- in- tests. 

The  suspected  bad  module  will  be  shipped  to  a repair  facility  (perhaps 
that  of  the  manufacturer)  where  they  will  be  further  tested  to  detect  and 
Isolate  faulty  components.  The  modules  will  be  repaired  at  the  repair  fa- 
cility and  returned  to  the  field. 
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Figure  2.8.  Two-Level  MCF  Maintenance  Philosophy 


2.3  Built-In-Test  Considerations 
2.3.1  Overall  Built-In-Test  Objectives 

In  order  to  accomplish  the  reliability  and  maintainability  goals  for 
the  MCF  computer  systems,  certain  general  objectives  as  discussed  in 
Section  1.1  were  set  forth  in  the  statement-of-work  for  this  study.  They 
are  briefly  restated  below. 

1.  Continuous  monitoring  and  indication  of  system  malfunction. 

2.  Diagnosis  of  system  malfunction  to  a module  level  with  a low 
probability  of  false  module  pull. 

3.  Measurement  and  recording  of  module  elapsed  power-on  time. 

These  objectives  along  with  an  assumed  fault  population  serve  as  the  guide- 
lines for  the  detailed  study  on  built-in-test  requirements  for  the  MCF  com- 
puter systems. 


2.3.2  Fault  Population 

Hardware  faults  in  digital  computer  systems  can  be  classified  in  two 
basic  categories.  The  stuck-at  (solid  or  permanent)  faults  and  the  inter- 
mittent (transient)  faults.*  The  stuck-at  faults  occur  when  a logic  signal 
remains  permanently  in  either  a one  or  a zero  state.  Such  failures  are 
consistent  and  the  failure  symptoms  are  reproducible.  This  facilitates  the 
isolation  of  stuck-at  faults  through  well  defined  diagnostic  test  proce- 
dures. 

The  intermittent  faults  on  the  other  hand  are  defined  as  random  fail- 
ures that  prevent  the  proper  operation  of  a unit  for  a short  period  imply- 
ing that  the  duration  of  the  failures  is  not  long  enough  for  the  applica- 
tion of  a test  procedure  designed  for  permanent  faults  [15].  The  inter- 
mittent faults  occur  due  to  environmental  as  well  as  non-environmental 


•Definitions  of  commonly  used  fault  detection,  isolation,  and  repair  terms 
may  be  found  in  Appendix  A of  this  report. 
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reasons.  Environmental  conditions  such  as  temperature,  humidity,  vibra- 
tion, electrical  and  electromagnetic  interferences,  etc.,  induce  intermit- 
tent faults.  More  important,  however,  are  the  non-environmental  intermit- 
tent faults  which  are  caused  by  loose  connections,,  resistance  variations, 
deteriorating  or  aging  components,  etc. 

Recent  studies  in  fault  diagnosis  [15],  [16],  [17]  indicate  that  a 
major  portion  of  digital  system  malfunctions  are  caused  by  intermittent 
faults.  In  some  systems,  8U  to  90  percent  of  the  faults  are  estimated  to 
be  intermittent  [15].  Furthermore,  these  faults  have  been  found  to  account 
for  more  than  90  percent  of  the  total  maintenance  expense  because  they  are 
difficult  to  detect  and  isolate. 

2.3.3  Built-In-Test  Functions 

There  are  several  functions  that  the  built-in-tests  must  perform  with 
the  ultimate  objective  of  enhancing  the  maintainability  of  the  MCF  computer 
system.  These  functions  are  listed  below. 

1.  Fault  Detection 

2.  Fault  Isolation 

3.  Fault  Indication 

4.  Fault  Communication  (Reporting) 

5.  Fault  Logging 

6.  Fault  Characterization  (stuck-at,  transient) 

7.  Fault  Handling  (Error  Recovery) 

The  above  mentioned  built-in-test  functions  are  usually  Implemented  in 
hardware  (including  firmware)  and  in  software.  The  primary  objective  is 
the  fault  detection  and  isolation.  These  two  objectives  may  be 
accomplished  in  various  ways:  on-line  (using  either  continuous  monitoring 
or  by  periodic  sampling),  during  Idle  time,  and  off-line.  The  definitions 
of  these  terms  can  be  found  in  the  Appendix  A. 

Fault  indication  implies  some  form  of  audio-visual  cue  to  the  opera- 
tor. This  indication  may  be  in  terms  of  indicator  lights  or  alpha-numeric 
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displays,  or  printed  message  regarding  the  status  of  the  system.  If  the 
system  becomes  non-operational , then  sufficient  fault  isolation  information 
should  be  available  to  the  operator  to  enable  a repair. 

The  fault  communication,  logging  and  characterization  are  also  essen- 
tial functions  which  provide  a means  for  establishing  the  health  status  of 
the  computer  system  at  any  given  time.  These  aid  in  fault  diagnosis  and 
ultimately  In  automatic  error  recovery  if  possible.  Faults  detected  at  the 
lowest  level  (module  level)  should  be  communicated  to  the  higher  levels 
(chassis,  system  levels).  Fault  communication  may  be  done  via  regular  data 
paths  in  the  system  or  via  separate  fault  communication  channels. 

Fault  logging  implies  any  means  of  keeping  a detailed  record  of  the 
failures  as  they  occur.  Such  error  record  provides  a diagnostic  feedback 
and  is  necessary  to  characterize  the  types  of  faults  that  occur  most  fre- 
quently. This  is  particularly  useful  in  diagnosing  failures  due  to  trans- 
ient or  intermittent  faults.  Since  the  failure  symptoms  due  to  intermit- 
tent faults  are  not  easily  reproducible,  the  isolation  of  intermittent 
faults  relies  more  on  the  accumulated  error  statistics.  Isolation  of  an 
intermittent  fault  becomes  easier  if  it  can  be  mapped  to  a set  of  intermit- 
tent faults  which  can  be  probabilistically  related  to  known  sources  of 
failures. 

Once  a fault  has  been  detected  while  the  computer  is  executing  a cer- 
tain instruction  of  the  application  program,  several  responses  are  possible 
which  depend  on  the  type  of  fault,  the  machine  status  at  the  time  the  fault 
occurred,  the  recovery  features  designed  in  the  computer  architecture.  All 
of  these  responses  to  a failure  may  be  broadly  classified  under  fault 
handling.  Typically,  one  of  the  following  actions  occur  when  a fault  is 
detected. 


1.  Abort  current  instruction  and  halt. 

2.  Branch  control  immediately  to  diagnostic  hardware,  firmware  or 
software  for  fault  diagnosis. 

3.  Attempt  instruction  retry. 
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Although  fault  handling  is  an  essential  function  of  the  built-in-test, 
it  is  considered  to  be  beyond  the  scope  of  this  report.  Therefore,  it  will 
not  be  treated  in  any  greater  detail.  Emphasis  will  be  placed  on  the  de- 
tection and  identification  of  the  failed  modules. 

t 


2.3.4  Built-In-Test  Approa  ,hes 

In  view  of  the  built-in-test  objectives  stated  in  Section  1.1  and  the 
Form,  Fit,  and  Function  (F3)  specifications  summarized  in  Section  2.1, 
a top-down  approach  to  the  selection  of  candidate  BIT  techniques  is  recom- 
mended. In  the  Military  Computer  Family,  the  following  hierarchical  levels 
are  easily  identifiable. 

1.  MCF  Member  Level  (System  Level) 

2.  Chassis  Level 

3.  Module  Level 

Built-In-Test  techniques  which  will  be  considered  may  be  incorporated 
at  any  one  or  combinations  of  the  above  mentioned  hierarchical  levels.  The 
basic  approach  used  in  this  study  is  to  identify  a set  of  BIT  techniques  at 
each  level  and  then  select  candidate  BIT  techniques  based  on  certain  per- 
formance versus  cost  criteria.  The  BIT  effectiveness  criteria  for  perfor- 
mance and  cost  are  discussed  in  the  next  section. 

Each  hierarchical  level  affords  a certain  level  of  fault  detection  and 
a degree  of  fault  isolation  capability  because  of  the  observability  and 
controllability  problems.  In  order  to  enhance  the  performance/cost  figure 
of  the  candidate  BIT  techniques,  it  Is  necessary  to  study  the  fault  de- 
tection requirements  and  the  BIT  resources  available  at  each  hierarchical 
level.  Furthermore,  the  fault  communication  and  hardware/software  inter- 
faces between  the  various  constituent  BIT  elements  at  each  hierarchical 
level  need  to  be  investigated. 

In  summary,  fault  detection  and  identification  at  the  various  levels 
may  be  performed  using  continuous  monitoring,  sampled  monitoring.  Idle  time 
monitoring  or  other  off-line  techniques.  Approaches  will  be  emphasized 
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which  will  provide  continuous  monitoring  with  minimum  impact  on  system 
performance. 


2.3.5  Built-In-Test  Effectiveness  Criteria  fpr  Performance  and  Cost 
The  effectiveness  of  any  built-in-test  approach  may  be  measured  in 
terms  of  the  ratio  of  its  performance  to  the  cost  of  implementing  it.  In 
quantifying  the  performance/cost  ratio  there  are  a significant  number  of 
parameters  or  sets  of  parameters  which  can  be  considered. 

In  view  of  the  broad  objectives  of  the  built-in-tests  for  the  MCF  com- 
puter systems,  a set  of  general  parameters  has  been  chosen.  Since  the  main 
objective  of  the  BIT  for  the  MCF  is  to  detect  and  isolate  faults  with  a low 
probability  of  false  module  pull,  the  performance  parameters  should  be  able 
to  measure  the  probabilities  of  detecting  and  localizing  faults  as  well  as 
the  probabilities  of  false  alarms.  Furthermore,  since  the  mean  time  to  re- 
pair is  also  an  essential  consideration  in  the  maintenance  of  MCF  computer 
systems,  the  performance  parameters  should  include  the  time  required  to 
detect  and  isolate  faults.  This  forms  a set  of  five  performance  measure- 
ment parameters  which  are  defined  below. 

PSFD  " Probability  of  System  Fault  Detection 

pLFE  " Probability  of  Localizing  to  Faulty  Element 

Pp a " Probability  that  suspected  Faulty  Element  is  not  Faulty. 

(False  Alarm) 

tSFD  - Time  System  Fault  Detection 
tLFE  " Time  t0  Localize  to  Faulty  Element 

The  cost  of  implementing  a BIT  approach  can  be  broadly  categorized 
into  hardware  and  software  costs.  The  hardware  costs  mainly  involve  space 
(A),  power  (P),  and  failure  rate  (FR).  The  hardware  cost  can  be  measured 
in  terms  of  the  percent  increase  in  the  space,  power,  and  failure  rate  due 
to  the  additional  BIT  circuitry. 

The  software  costs  on  the  other  hand  are  more  difficult  to  assess. 

The  software  is  impacted  at  three  levels:  1)  operating  system  software 


24 


(OS),  2)  applications  software  (AS),  and  3)  diagnostic  software  (DS). 
Additional  BIT  functions  typically  increases  the  operating  system  responsi- 
bilities because  it  must  provide  for  the  user/BIT  interface  and  may  have  to 
perform  error  handling  tasks.  The  BIT  functions  are  generally  transparent 
to  the  user.  However,  the  application  software  will  be  impacted  if  the 
user  is  to  be  provided  with  the  option  to  control  some  of  the  BIT  func- 
tions. The  diagnostic  software  can  generally  be  simplified  by  additional 
BIT  hardware. 

The  Figure  2.9  and  the  Table  2.4  summarize  the  effectiveness  criteria 
used  in  accessing  the  BIT  approaches  for  the  MCF  computer  systems. 
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TABLE  2.4.  MCF  BUILT- IN-TEST  PERFORMANCE/COST  CRITERIA 


Criteria 


Performance 


Comment 


1.  Probability  of  on-line  detection 
of  a system  malfunction  ( PsFD> 

2.  Time  to  detection  of  system 
malfunction  (Tsfq) 


3.  Probability  that  a suspected 
faulty  element  is  not  faulty 
<pfa) 

4.  Time  to  localize  to  a faulty 
system  element  ( TLp£ ) 


b.  Probability  of  localizing  to  a 
faulty  system  element  ( PLFE ^ 


Criteria 

Space  (A) 

Power  (P) 

Failure  Rate  Increase  ( FR ) 

Operating  System  (OS) 

Applications  Software  (AS) 

Diagnostic  Software  (DS) 


Cost 


Implies  fault  detection  and 
notification  of  fault. 

Includes  fault  detection  error 
latency  plus  user  notifica- 
tion time. 

Refers  to  maintenance  false  alarm 
rate. 


Time  between  user  initial  noti- 
fication that  system  has 
malfunctioned  and  when  user 
determines  which  element  is 
faulty. 

Probability  of  determining  which 
module,  chassis,  member  is 
faulty. 


Comment 

Includes  board  space,  chassis 
slots,  module  pinouts,  chip 
count,  etc. 

Refers  to  additional  power 
required  by  BIT  circuitry. 

Reduction  In  module,  chassis 
and/or  system  MTBF  due  to  BIT 
circuitry. 

May  be  impacted  If  error  handling 
is  part  of  the  OS  responsibility. 
Also  can  be  impacted  by  user/BIT 
Interface. 

May  be  Impacted  if  user  is  pro- 
vided with  the  option  to  control 
some  of  the  BIT  functions. 

Generally  can  be  simplified  by  BIT 
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3.0  OVERVIEW  OF  SOME  RELATED  PRIOR  GENERATION  COMPUTER  BUILT- IN-TEST 

FEATURES 

In  an  effort  to  build  on  the  BIT  knowledge  and  experience  that  other 
people  have  developed,  a review  of  prior  computers  was  carried  out.  Care- 
ful attention  was  paid  to  BIT  techniques,  not  only  in  fault  detection 
schemes,  but  in  error  handling  and  fault  reporting  approaches  also.  Some 
of  these  computers  have  fault  tolerant  features  built  into  them  that  affect 
their  maintainability.  Special  note  was  made  of  these  features  when  they 
might  be  applicable  to  the  MCF  computer  systems.  Five  computers  were  se- 
lected for  this  detailed  fault  detection/reporting  study.  The  STAR  com- 
puter and  1A  Processor  (from  the  No. 4 ESS)  were  chosen  because  of  their  ex- 
tensive fault  tolerant  features.  The  PDP  11/60  and  11/70  were  included  in 
this  group  as  representatives  of  current  commercial  minicomputers.  The 
last  computer  in  this  group  is  the  AN/AYK-14(V)  which  represents  a modern 
military  minicomputer. 

3.1  STAR 

The  Jet  Propulsion  Laboratory  developed  the  STAR  computer  to  be  used 
In  space  missions  where  on-site  repair  was  impossible.  Therefore,  it  was 
necessary  to  design  a computer  that  was  ultra-reliable.  Toward  this  goal, 
fault- tolerance  was  used  extensively.  While  the  MCF  computers  will  not  be 
designed  for  complete  fault  tolerance,  many  of  the  fault  detection  techni- 
ques used  In  the  STAR'S  fault  tolerance  can  be  used. 

In  the  STAR  computer,  all  machine  words,  both  data  and  Instructions, 
are  encoded  In  error-detecting  codes.  Fault  detection  occurs  concurrently 
with  the  execution  of  the  programs.  The  error-detecting  codes  are  supple- 
mented by  monitoring  circuits  which  serve  to  verify  the  proper  synchro- 
nization and  Internal  operations  of  the  functional  units.  Each  functional 
unit  is  autonomous  and  contains  its  own  sequence  generator,  as  well  as 
storage  for  the  current  operation  code,  operands  and  results.  One  out  of 
every  ten  clock  cycles  Is  used  to  report  status  (error)  Information  to  the 
central  control  unit.  Status  message  originating  circuits  within  the  I/O, 
as  well  as  the  status  lines  are  duplicated  to  allow  the  detection  of  a 
fault  In  the  status  message.  The  absence  of  an  expected  "Output  Active" 


message  is  also  a fault  condition.  Finally,  some  more  critical  I/O  units 
are  duplicated  to  ensure  that  all  operations  are  performed  correctly. 
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3.2  1A  Processor 

The  1A  Processor  of  the  No.  4 ESS  (Electronic  Switching  System)  was 
developed  by  the  Bell  System  to  handle  a large  number  of  long  distance 
telephone  calls  with  an  availability  very  close  to  100%.  In  fact,  the 
objective  was  less  than  two  minutes  a year  down  time.  With  reliability  and 
availability  goals  this  high,  it  is  necessary  to  use  fault  tolerance  tech- 
niques to  allow  operation  of  the  unit  until  the  failed  part  can  be  re- 
placed. Special  attention  was  given  to  the  fault  detection  techniques  as 
these  are  essential  to  BIT,  also.  A block  diagram  of  this  processor  is 
shown  in  Figure  3.1. 

In  the  1A  processor,  all  subsystems  have  redundant  units  that  are  con- 
nected to  the  basic  system  via  a redundant  bus  system.  The  central  control 
unit  is  fully  duplicated;  they  operate  in  step  and  compare  all  results. 

The  memory  subsystem  that  contains  the  program  consists  of  a prime  set  plus 
two  "roving  spare"  units.  In  the  event  of  a failure,  the  contents  of  the 
faulty  memory  is  copied  from  the  duplicate  copy  to  one  of  roving  spare 
units.  The  memory  that  stores  the  data  on  transient  calls  is  fully  dupli- 
cated on-line. 

Parity  checks  are  performed  on  all  communications,  that  is  on  both 
address  and  data  over  all  buses.  Within  subsystems,  there  are  interval 
self  checking  timers  that  can  detect  major  timing  errors  and  lack  of  sub- 
system response.  Each  peripheral  device  is  polled  to  determine  its  status. 
Under  program  control  a signal  can  be  sent  to  each  I/O  device  to  request  an 
automatic  response,  which  checks  the  I/O  loops.  All  vital  communications 
buses  are  duplicated  and  transmitted  information  contains  redundant  infor- 
mation for  error  detection.  In  addition,  transformers  are  used  to  couple 
the  bus  to  minimize  the  probability  that  a faulty  I/O  device  could  make  the 
bus  completely  unusable  for  all  other  devices  on  the  bus.  Internal  parity 
is  carried  with  most  of  the  Information  with  each  subsystem.  Software  as 
well  as  the  normal  hardware  checks  are  used  to  check  for  parity  errors, 
thus  each  verifying  the  proper  operation  of  the  others. 
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TO  SWITCHING 
NETWORK 


3.3  PDP-11/6U 

The  PDP-11/6U  is  a general  purpose  commercial  minicomputer  manufac- 
tured by  the  Digital  Equipment  Corporation  (DEC)  [7],  It  is  a 16-bit  user 
microprogrammable  machine.  Several  test  and  fault  diagnosis  features  have 
been  incorporated  in  the  PDP-11/6U  design  that  make  it  more  easily  main- 
tainable. The  PDP-11/60  features  related  to  built-in- test  are  discussed 
below. 

1.  Diagnostic/Bootstrap  Loader.  The  bootstrap  loader  program  is 
stored  in  a special  ROM  along  with  a rudimentary  diagnostic  pro- 
gram.  This  diagnostic  is  executed  each  time  the  system  is  boot- 
strapped. It  tests  the  central  processor,  cache  memory,  main 
memory  and  the  basic  PDP-11/60  instruction  set.  Hardware  problems 
detected  by  the  diagnostics  cause  the  computer  to  halt  and  the 
fault  signature  is  displayed  on  the  console  panel. 

2.  Diagnostic  Control  Store  (DCS)  Module.  This  module  has  a 2K  x 
48-bit  ROM  which  contains  microdiagnostics  for  testing  the  CPU. 

The  module  has  its  own  self-testing  diagnostic  microcode.  The 
microdiagnostics  can  be  initiated  from  the  console  panel  or  the 
DCS  module  itself.  LED's  on  the  DCS  module  Indicate  an  error  code 
which  can  be  looked  up  in  a fault  directory  to  determine  the  de- 
fective CPU  board(s). 

3.  Error  Logging.  The  CPU  logs  error  information  into  special 
scratchpad  registers  at  the  time  of  error.  This  error  log  in- 
cludes UNI  BUS  data,  physical  address,  cache  address,  cache  data, 
next  microaddress,  last  interrupt  vector  at  the  time  of  error. 

This  error  log  can  be  read  from  the  console  panel  or  used  by 
diagnostic  programs  for  fault  isolation. 

4.  Parity  bit(s)  and  associated  parity  generation/checking  are  avail- 
able on  cache  and  main  memory  (core).  For  semiconductor  main 
memory  (MOS)  error  correcting  code  (ECC)  is  also  available  option- 
ally. 

5.  Software  Diagnostics.  There  are  several  types  of  diagnostic  soft- 
ware available  for  fault  detection.  Isolation,  and  reliability 
tests.  This  software  typically  resides  on  mass  storage  devices 
and  must  be  loaded  in  the  main  memory  before  being  executed. 
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Table  3.1  summarizes  the  fault  diagnosis  features  available  on  the 
PDP-11/60.  It  is  interesting  to  note  that  even  with  the  above  described 
hardware  diagnostic  features,  the  PDP-11/60  system  does  rely  heavily  on 
off-line  stand-alone  diagnostic  software  for  fault  isolation. 

3.4  PDP-11/70 

The  PDP-11/70  is  an  older  but  larger  computer  than  the  PDP-11/60  [19]. 
It  is  a 16-bit  medium  range  general  purpose  computer.  It  has  all  of  the 
fault  detection  and  isolation  features  discussed  above  for  the  PDP-11/6U 
with  the  exception  of  the  Diagnostic  Control  Store.  Although  PDP-11/7U  is 
a microprogrammed  machine,  it  is  not  user  microprogrammable.  Its  initial 
design  did  not  allow  for  expansion  of  the  micromemory  to  incorporate  micro- 
diagnostics.  However,  this  does  not  mean  that  microdiagnostics  cannot  be 
incorporated  In  the  MCF-AN/UYK-41( V)  which  emulates  the  PDP-11/70  instruc- 
tion set.  Table  3.2  summarizes  the  fault  detection  and  isolation  features 
available  on  the  PDP-11/70  computers. 

In  addition  to  the  above  discussed  fault  detection  and  isolation  fea- 
tures, the  PDP-11/70  and  also  PDP-11/60  have  certain  on-line  fault  report- 
ing and  subsequent  error  handling  features.  The  following  general  philos- 
ophy is  followed.  The  hardware  faults  are  classified  into  soft  errors  and 
hard  errors.  All  errors  when  detected  are  logged  in  error  status  regis- 
ters. The  soft  errors  are  generally  those  that  can  be  recovered.  It  is 
the  responsibility  of  the  system  software  or  the  application  software  to 
check  the  error  bits  In  the  error  status  registers  to  determine  if  a soft 
error  has  occurred  and  provide  the  necessary  error  handling.  The  hard 
errors  on  the  other  hand  are  not  recoverable  and  cause  a trap  either  midway 
through  an  Instruction  or  upon  completion  of  the  current  Instruction.  Each 
hard  error  or  a group  of  hard  errors  causes  a trap  (vectored  Interrupt)  via 
a predefined  location  In  the  memory.  It  Is  the  responsibility  of  the  sys- 
tem software  to  provide  trap  handling  routines  to  diagnose  the  fault.  In 
some  cases,  e.g.,  cache  and  memory  management  units,  partial  Instruction 
retry  Is  attempted  before  generating  the  trap.  So,  there  is  a combination 

Iof  hardware/software  to  provide  a limited  amount  of  error  correction  and 
recovery.  Such  features  of  the  PDP-11/7U  are  summarized  in  Table  3.3. 
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3.5  AN/AYK-14( V) 

The  AN/AYK-14( V)  is  a standard  airborne  computer  designed  by  the 
Control  Data  Corporation  [21],  It  is  a subset  of  the  recently  developed 
CDC-480  computer  family.  The  AN/AYK-14(V)  computer  system  provides  16 
module  types  which  can  be  configured  in  various  combinations  in  three 
different  chassis  types.  This  feature,  like  that  of  the  MCF  computer 
systems,  makes  it  a variable  configuration,  general  purpose  minicomputer. 

The  AN/ AYK-14( V ) has  several  built-in-test  features  worth  mentioning. 
Resident  in  the  system  are  the  BIT  hardware,  BIT  firmware  and  In-Flight 
Performance  Monitoring  (IFPM)  software  which  are  used  to  detect  and  isolate 
a faulty  computer  chassis  in  the  field.  The  faulty  computer  chassis  sent 
to  a shop  level  maintenance  facility  where  a Loader/Veri fier  (L/V)  is  used 
to  isolate  the  malfunctioning  Shop  Replaceable  Assembly  (SRA)  through  Fault 
Isolation  Diagnostic  (FID)  software.  The  faulty  SRA  is  forwarded  to  a 
repair  facility  at  the  depot  level  for  isolation  and  repair  of  the  faulty 
components  through  the  use  of  Automatic  Test  Equipment  (ATE).  [21] 

The  fault  detection  and  isolation  features  for  the  AN/AYK-14( V)  are 
summarized  in  Table  3.4. 
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4.0  MCF  BUILT-IN-TEST  DESCRIPTION  AND  ASSUMPTIONS 

The  primary  objective  of  this  section  is  to  formulate  a coherent 
structure  for  the  built-in- tests  to  be  specified  for  the  MCF  computer  sys- 
tems and  define  a set  of  guidelines  from  which  specific  recommendations  for 
the  built-in- test  hardware  and  software  can  be  developed.  In  the  process 
of  defining  an  overall  built-in-test  strategy,  several  assumptions  have 
been  made  regarding  the  operation  and  maintenance  of  the  MCF  computer  sys- 
tems. These  assumptions  together  with  the  concepts  involved  in  designing 
built-in-test  features  are  discussed  in  this  section. 

The  major  concepts  presented  in  this  section  are  given  below.  These 
concepts  are  not  new  and  in  the  past  some  of  these  concepts  have  in  fact 
been  put  into  practice  on  military  as  well  as  commercial  computer  systems. 

1.  Distribution  of  responsibility  for  conducting  built-in-test  among 
three  (3)  hierarchical  levels  (module,  chassis,  and  system  levels) 
of  the  MCF  computer  systems. 

2.  Stand-alone,  self-test  capability  at  the  chassis  level  for  all 
chassis  within  the  system. 

3.  User  selectable/programmable  built-in-test  features  to  allow  the 
BIT  functions  to  be  tailored  to  meet  the  requirements  of  a wide 
range  of  applications. 

4.  A building  block  or  layered  approach  for  conducting  built-in-test. 
In  this  approach,  the  most  basic  hardware  functions  are  tested 
first.  Once  the  basic  hardware  blocks  are  checked  out,  they  can 
be  used  in  testing  larger  and  more  complex  hardware  functions. 

This  approach  reduces  the  total  test  equipment  cost. 

5.  A provision  for  alternative,  independent  testing  configurations  to 
allow  a degree  of  overlap  or  redundancy  in  the  built-in- test. 

This  implies  having  more  than  one  way  of  testing  a faulty  element 


on  the  system.  Although  this  increases  the  total  test  equipment 
required.  It  decreases  the  false  alarms  by  cross  checking  (veri- 
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4.1  Three-Level  Built-In-Test  Hierarchy 

It  should  be  recalled  from  the  description  presented  in  Section  2.1 
that  the  MCF  computer  systems  are  functionally  partitioned  into  modules. 
These  nodules  plug  into  a backplane  within  a chassis  and  communicate  with 
each  other  via  a common  bus  structure.  A chassis  when  populated  with  an 
appropriate  complement  of  modules  forms  a working  subsystem.  A set  of 
chassis  interconnected  with  cables  forms  a complete  system. 

Furthermore,  recall  that  the  chassis  are  of  two  basic  types:  1)  Main 
Computer  Chassis,  2)  Expansion  Chassis.  These  two  types  of  chassis  differ 
in  one  important  respect.  The  Main  Computer  Chassis  contains  the  central 
intelligence  of  the  system  in  the  CPU3  module  which  is  not  available  in  the 
Expansion  Chassis.  In  fact,  in  the  single  processor  system  in  the  MCF 
AN/UYK-41(V) , the  Main  Computer  Chassis  No.  1 can  by  itself  be  operated  as 
a complete  system  without  any  Expansion  Chassis.  However,  no  Expansion 
Chassis  by  itself  can  constitute  a system.  This  distinction  is  important 
from  the  built-in-test  viewpoint  because  certain  degree  decision  making  is 
required  in  performing  all  of  the  BIT  functions  mentioned  in  Section  2.3.3. 

This  type  of  a partitioned  system  structure  lends  itself  to  a similar 
partitioning  in  the  implementation  of  bui it-in- tests.  It  is  desirable  to 
distribute  the  responsibilities  of  testing  to  the  various  hierarchical 
levels,  namely,  the  module,  chassis,  and  system  levels,  rather  than  concen- 
trating them  at  any  one  level  or  delegating  them  to  one  particular  module. 
The  following  built-in-test  levels  are  to  be  identified: 

1.  Module  Level  BIT  (MBIT) 

2.  Chassis  Level  BIT  (CBIT) 

(a)  Expansion  Chassis  BIT  (ECBIT) 

(b)  Main  Computer  Chassis  BIT  (MCBIT) 

3.  System  (or  Member)  Level  BIT  (SBIT) 

The  Module  Level  BIT  is  the  lowest  level  while  the  System  Level  BIT  Is 
the  highest  level.  In  order  to  distinguish  between  the  BIT  in  the  Main 
Computer  and  the  Expansion  Chassis,  they  have  been  assigned  separate  BIT 
levels. 


4.2  Fault  Coverage 

One  of  the  primary  goals  of  the  built-in-tests  for  the  MCF  computer 
system  is  to  provide  continuous  system  monitoring  and  indication  of  system 
malfunction.  This  implies  that  on-line  fault  detection  and  isolation  to- 
gether with  local  fault  indication  must  be  provided.  It  is  conceived  that 
this  will  be  the  primary  responsibility  of  the  Module  Level  BIT.  The  BIT 
at  the  module  level  has  the  advantage  of  being  able  to  access  the  signals 
and  test  points  internal  to  the  module.  Good  observability  is  important  in 
on-line  testing.  However,  on-line  testing  has  its  disadvantage  in  that  it 
is  restricted  to  passive,  non-interfering  and  non-disruptive  techniques. 
This  restriction  places  certain  limitations  on  the  fault  coverage  that  can 
be  obtained  through  on-line  testing  at  module  level. 

In  order  to  increase  fault  coverage,  built-in-tests  at  chassis  and 
system  levels  can  be  employed  to  test  the  hardware  on  the  modules.  Since 
the  modules  are  defined  as  functional  building  blocks  which  are  intercon- 
nected by  common  bus  structure  (data,  address,  and  control  paths),  testing 
at  a functional  level  is  more  appropriate.  Functional  level  testing  in 
this  context  implies  generating  an  input  test  pattern  to  excite  a certain 
function  on  the  module  and  comparing  the  response  to  a predetermined  value. 
This  Is  an  active  form  of  testing.  Active  testing  is  usually  done  either 
off-line  or  during  idle  time  because  it  is  generally  interfering  and  dis- 
ruptive. 

It  should  also  be  mentioned  that  apart  from  additional  fault  coverage, 
the  chassis  and  system  level  tests  may  be  used  to  provide  fault  verifica- 
tion and  thereby  reduce  the  false  alarm  rate.  On-line  tests  at  the  module 
level  typically  will  not  discriminate  between  intermittent  and  stuck-at 
(solid)  faults.  Error  logging  and  off-line  tests  are  particularly  useful 
for  this  reason  which  can  be  provided  at  the  chassis  and  system  levels. 

4.3  Built-in-Test  Objectives 

In  view  of  the  above  general  discussion  regarding  the  3-level  built- 
in-test  hierarchy  reasonable  objectives  should  be  set  forth  for  the  BIT  at 
each  level  so  that  more  detailed  specifications  can  be  developed.  The 
following  sections  define  these  objectives. 


40 


4.3.1  Module  Level  Built-In-Tests 


1.  The  Module  Level  BIT  should  provide  adequate  on-line,  local  fault 
detection  for  the  hardware  functions  implemented  on  a given  mod- 
ule. 

2.  It  should  provide  a continuous  indication  of  the  module  status 
(operate/failed)  on  the  module. 

3.  It  should  communicate  (report)  all  faults  to  the  higher  level  BITs 
(chassis  and  system  level). 

4.  It  should  assist  the  higher  level  BITs  in  performing  idle  time, 
periodic,  or  off-line  tests  to  extend  fault  coverage. 

4.3.2  Chassis  Level  Built-In-Tests 


1.  The  Chassis  Level  BIT  should  provide  additional  fault  coverage  for 
modules  within  a chassis  by  using  idle  time,  periodic,  or  OFF-line 
testing  techniques. 

2.  It  should  provide  an  alternative,  independent  means  for  indicating 
faults  detected  in  any  module  within  the  chassis.  This  is  in  ad- 
dition to  and  separate  from  the  fault  indicators  on  the  modules. 

3.  It  should  assist  the  higher  level  BIT  (system  level)  in  idle  time, 
periodic  or  off-line  testing  of  modules  within  a chassis. 

4.  It  should  assist  the  higher  level  BIT  in  communicating,  logging, 
and  characterizing  of  all  faults  detected  within  a chassis  in 
on-line  mode. 

5.  It  should  provide  an  alternative,  stand-alone  means  for  testing 
the  modules  within  a chassis  in  an  off-line  mode. 

4.3.3  System  Level  Built-in-Tests 

1.  The  System  Level  BIT  should  provide  additional  fault  coverage  for 
all  modules  within  the  system  by  using  idle  time,  periodic,  or 
off-line  testing  techniques. 

2.  It  should  provide  means  for  logging,  characterizing  of  all  faults 
detected  within  the  system  in  on-line  mode. 
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3.  It  should  report  the  faults  to  the  operator. 

4.  It  should  provide  an  interface  with  the  operator  for  trouble- 
shooting and  general  maintenance  of  the  system. 

4.4  Built-in-Test  Resource  Characterization 

In  order  to  realize  the  above  mentioned  objectives  of  the  built-in- 
tests, certain  hardware  and  software  resources  would  be  required  at  each  of 
the  three  levels. 

At  the  module  level,  the  BIT  functions  performed  are  mainly  fault  de- 
tection, indication,  and  communication.  Fault  detection  and  isolation  are 
synonymous  at  this  level  because  the  faults  have  to  be  localized  only  to  a 
module.  It  is  possible  to  implement  these  BIT  functions  using  non- 
Intelllgent,  fixed  and  programmable  hardware  logic  cirucits  which  can  re- 
side on  the  module.  The  programnable  hardware  will  be  required  to  extern- 
ally enable  certain  BIT  features  to  facilitate  the  testing  of  the  hardware 
on  the  module  from  the  Chassis  or  System  Level  BITs. 

The  Chassis  Level  BIT  is  responsible  for  testing  all  of  the  modules 
(Including  itself)  within  a chassis.  It  must,  therefore,  be  capable  of 
providing  almost  all  of  the  BIT  functions  with  the  exception  of  error 
handling  which  can  only  be  done  at  the  system  level.  It  must  also  perform 
active  tests  on  the  modules.  For  these  reasons,  it  is  envisioned  that  at 
the  Chassis  Level  the  built-in- tests  can  best  be  implemented  using  intelli- 
gent hardware  with  certain  decision  making  capability  such  as  a micropro- 
cessor. This  intelligent  hardware  may  be  supported  by  software  in  local 
storage.  Typical  software  would  consist  of  test  patterns  and  simple  diag- 
nostic routines.  Furthermore,  the  Chassis  Level  BIT  hardware  and  software 
may  be  placed  In  a separate  module  within  the  chassis.  A single  Chassis 
Level  BIT  module  may  be  designed  which  can  be  programmed  to  meet  the 
built-in-test  requirements  of  the  various  types  of  chassis.  In  addition,  a 
simple  maintenance  panel  accessible  to  the  operator  may  be  added  to  each 
chassis  on  which  the  fault  signature  may  be  displayed.  This  maintenance 
panel  should  also  have  a few  switches  with  which  the  operator  can  initiate 
the  chassis  level  diagnostics.  Such  an  arrangement  will  provide  a 
stand-alone  self-test  capability  at  the  chassis  level. 

The  System  Level  BIT  will  reside  in  the  Main  Computer  Chassis  where 
the  microcode  and  Intelligence  of  the  CPU  module  can  be  used.  It  Is 
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assumed  that  most  of  the  buil t-in-tests  at  the  system  level  will  be  soft- 
ware or  firmware  based.  The  diagnostic  software  may  be  placed  in  non- 
volatile main  memory  or  may  be  resident  on  an  external  mass  storage  media. 
It  is  also  assumed  a sophisticated  system  console  panel  will  be  available 
which  can  be  used  by  the  operator  to  control  the  CPU  during  both  normal  and 
maintenance  operations.  During  normal  operations,  the  operator  can  input 
regular  commands  to  the  operating  system.  During  maintenance  testing,  the 
operator  should  be  able  to  directly  access  those  CPU  functions  via  the 
console  panel  that  make  diagnosis  possible.  Furthermore,  it  is  envisioned 
that  the  console  panel  function  may  be  extended  to  not  only  provide  on-site 
(local)  diagnostic  facility,  but  optionally  also  provide  a capability  to 
conduct  diagnosis  from  a remote  site. 

The  Table  4.1  summarizes  the  characteristics  of  the  BIT  hardware  and 
software  resources  required  to  implement  the  built-in-tests  at  the  three 
hierarchical  levels. 

4.5  Built-In-Test  Equipment  Configuration  and  Interface  Definition 

The  diagrams  in  Figure  4.1  help  visualize  the  physical  placement  of 
the  above  discussed  built-in-test  equipment  (BITE). 

More  detailed  block  diagrams  including  the  additional  built-in-test 
equipment  are  shown  for  each  type  of  MCF  chassis  in  Figures  4.2,  4.3,  4.4, 
and  4.5.  From  these  block  diagrams,  general  interface  definitions  can  be 
involved  for  communication  among  the  3-levels  of  built- in- tests  and  other 
constituent  modules  of  the  MCF  computer  systems.  These  interfaces  are 
identified  in  Figures  4.2  through  4.5  by  letters  A,  B,  C,  D,  and  E.  The 
definitions  and  functions  of  these  Interfaces  are  given  below. 

A.  Interface  between  Chassis  Level  BIT  Module  and  all  other  MCF 
modules.  This  interface  uses  the  existing  MCF  bus  structure  and 
will  be  used  for  two  purposes.  1)  To  allow  the  Chassis  Level  BIT 
Module  to  conduct  idle  time  or  off-line  tests  on  all  other  MCF 
modules  within  the  chassis.  2)  To  directly  report  any  detected 
faults  to  the  CPU  module. 
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TABLE  4.1  SUMMARY  OF  THE  MCF  BIT  RESOURCES 


Hierarchical 

BIT 

Level 

1 

BIT  Resource  Characteristics 

1 

Module 

(a) 

Non- intelligent  fixed  and  programmable 

hardware  resident  on  the  module 

Chassis 

(a) 

Intelligent  hardware  and  diagnostic  soft- 
ware (or  firmware)  resident  on  a separata 

module  within  the  chassis 

(b) 

A maintenance  panel  on  the  chassis 
accessible  to  the  operator. 

System 

(a) 

Intelligent  hardware  and  diagnostic  firm- 
ware resident  on  the  CPU  module 

(b) 

Diagnostic  software  resident  on  non- 
volatile main  memory  and/or  on  external 
mass  storage  device 

(c) 

i 

A system  console  panel  with  local  and 
remote  diagnosis  capability 

1 

1 
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Figure  4.1  MCF  Bui lt-In-Test  Equipment  Configuration 
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Figure  4.2  AN/GYQ-21  (V)  Main  Computer  Chassis  flo.  1 Block  Diagram  with  Additional  Built- 
In-Test  Equipment  [9] 


Equipment 


Expansion  Chassis  Block  Diagram  with  Additional  Built-in-Test  Equipment  [12] 


B.  Interface  between  Module  Level  BIT  and  Chassis  Level  BIT  Module. 
This  interface  is  required  to  communicate  the  faults  detected  by 
the  Module  Level  BIT  circuits  to  the.  Chassis  Level  BIT  Module. 

C.  Interface  between  Expansion  Chassis  Level, BIT  Modules  (ECBIT)  and 
Main  Computer  Chassis  Level  BIT  Modules  (MCBIT).  This  interface 
provides  a certain  degree  of  redundancy  in  the  fault  reporting 
mechanism.  It  will  be  used  in  case  the  on-line  fault  logging  is 
to  be  concentrated  at  the  MCBIT  level  rather  than  at  the  system 

1 evel . 

D.  Interface  between  the  Chassis  Level  BIT  Module  and  the  Chassis 
Maintenance  Panel.  This  interface  will  allow  fault  signatures  to 
be  displayed  on  the  Chassis  Maintenance  Panel  and  allow  the  opera- 
tor to  manually  enable  chassis  level  tests. 

E.  Interface  between  the  Main  Computer  Chassis  BIT  Module  (MCBIT)  and 
the  CPU  module.  Because  of  the  inherent  complexity  of  the  CPU 
functions,  it  is  felt  that  this  additional  interface  (in  addition 
to  the  interface  A)  will  be  required  to  test  the  CPU  logic  and 
internal  data  paths. 

In  defining  the  above  interfaces,  an  attempt  has  been  made  to  distin- 
guish between  them  on  a functional  basis.  In  actual  implementation,  how- 
ever, some  of  these  Interfaces  may  be  combined  to  form  a single  bus  or  the 
existing  MCF  buses  expanded  to  incorporate  the  additional  signal  paths. 

4.6  Fault  Communication  Concepts 

Once  the  hierarchical  structure  for  the  built-in- test  has  been  estab- 
lished, there  are  several  ways  In  which  faults  may  be  communicated  between 
the  various  BIT  levels.  Three  basic  fault  reporting  schemes  are  shown  in 
Figure  4.6. 

The  choice  of  any  one  of  these  fault  reporting  schemes  is  dependent 
upon  the  nature  of  the  application  for  which  the  MCF  computer  is  being 
used.  In  Figure  4.6  (a)  the  faults  are  reported  directly  to  the  CPU. 
Immediate  action  can,  therefore,  be  taken  In  so  far  as  error  handling  is 
concerned.  This  scheme  is  useful  In  applications  where  a system  malfunc- 
tion can  produce  harmful  results  such  as  In  automatic  feedback  control 
systems.  In  contrast.  In  the  scheme  shown  In  Figure  4.6  (c),  all  faults 


are  reported  to  the  CPU  via  the  Main  Computer  Chassis  BIT  Module  (MCBIT). 

In  this  case,  the  MCBIT  can  perform  the  fault  logging  and  characterization 
functions  and  report  only  the  critical  failures  to  the  CPU.  Such  schemes 
may  be  more  desirable  in  those  real-time  applications  where  the  CPU  time  is 
more  precious  but  failures  in  non-critical  components  can  be  tolerated. 

The  fault  reporting  mechanism  in  Figure  4.6  (b)  is  a compromise  between  the 
two  above  mentioned  schemes.  It  has  the  advantage  that  at  least  the  fault 
communication  functions  for  the  Expansion  Chassis  and  the  Main  Computer 
Chassis  Level  BITs  are  identical. 

In  the  interest  of  flexibility,  instead  of  selecting  any  one  of  the 
above  fault  communication  schemes,  a user  programmable  reporting  hardware 
could  be  provided  whereby  any  one  or  a combination  of  the  above  schemes  may 
be  selected  by  the  user  depending  upon  the  application. 

4.7  Generalized  Functional  Flow  Charts  for  the  Built-In-Tests 

In  this  section,  flow  charts  are  presented  to  clarify  the  operation  of 
the  built-in-test  mechanism  at  the  3-hierarchical  levels.  The  flow  charts 
for  the  Module,  Expansion  Chassis,  Main  Computer  Chassis,  and  the  System 
Level  BITs  are  given  in  Figure  4.7,  4.8,  4.9,  and  4.1U,  respectively. 

These  flow  charts  are  almost  self  explanatory.  They  describe  the  se- 
quence of  events  that  occur  at  the  various  BIT  levels  in  the  process  of 
fault  detection  and  its  communication  from  the  lowest  to  the  highest  level. 
An  Important  assumption  made  In  these  flow  charts  is  that  the  fault  com- 
munication occurs  via  an  Interrupt  mechanism  from  the  lower  levels  to  the 
higher  levels,  while  the  fault  logging  occurs  via  a polling  mechanism  where 
the  higher  levels  obtain  the  error  information  from  the  lower  levels  by 
reading  their  appropriate  error  status  registers.  This  assumption  was  made 
to  allow  the  fault  communication  to  be  user  programmable  by  enabling  or 
disabling  the  appropriate  Interrupts  and/or  polling  when  desired  under 
application  program  control.  In  practice,  various  combinations  of  inter- 
rupt and  polling  mechanisms  may  be  used  for  keeping  track  of  the  health 
status  of  the  system. 

These  flow  charts  have  been  generalized  to  cover  on-line,  idle  time, 
and  off-line  fault  monitoring  techniques.  They  also  show  the  stand-alone, 
self- test  mode  at  the  chassis  level. 
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Figure  4.7  Module  Level  BIT  (MBIT)  Functional  Flow  Chart  for 
On-Line  and  Off-Line  Cases 


Figure  4.8  Expansion  Chassis  Level  BIT  (ECBIT)  Functional  Flow  Chart 
For  On-Line,  Idle  Time,  and  Off-Line  Cases 


Figure  4.10  System  Level  BIT  (SBIT)  Functional  Flow  Chart 
For  On-Line,  Idle  Time,  and  Off-Line  Cases 
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4.8  Basic  System  Test  Configurations 

One  of  the  motivations  in  recommending  a Chassis  Level  BIT  is  to  pro- 
vide a stand-alone,  self -test  capability  for  each  chassis  within  the  sys- 
tem. An  advantage  of  this  is  that  if  hardware  in  a certain  chassis  Is 
suspected  to  be  mal functioning  that  chassis  may  be  disconnected  from  the 
system  for  test  purposes.  While  the  chassis  is  being  tested  In  self-test 
mode  to  localize  the  fault,  the  system  can  at  least  be  operated  in  a re- 
duced configuration.  This  way  the  entire  system  resources  need  not  be  tied 
up  during  maintenance.  Figure  4.11  (a)  depicts  the  test  configuration  for 
the  stand-alone  chassis  level  tests. 

In  case  of  a complete  system  failure,  the  testing  should  begin  in  a 
buildinq-block  fashion.  First,  the  System  Console  Panel  should  be  used  to 
check  out  the  hard  core  logic  (microsequencer  and  micromemory  operations). 
Following  that  microdiagnostics  can  be  invoked  to  test  the  basic  CPU  func- 
tions. If  necessary,  microdiagnostics  may  be  loaded  Into  micromemory  from 
the  external  mass  storage  device  via  the  system  console  panel.  This  basic 
CPU  test  configuration  is  shown  in  Figure  4.11  (b).  Microdiagnostics  may 
also  be  used  to  check  the  cache  memory  (if  present),  main  memory  and  its 
associated  data  paths. 

The  remaining  portions  of  the  system  can  then  be  tested  under  CPU 
control  using  macrodiagnostic  software  as  shown  in  Figure  4.11  (c).  The 
macrodiagnostics  should  include  functional  and  reliability  tests  for  all 
modules  and  system  peripheral  devices  (disks,  magtapes,  line  printers, 
etc. ) . 

4.9  Built-In-Test  Functional  Specifications 

In  view  of  all  the  discussions  regarding  the  3-level  approach  for 
built-in-tests  presented  so  far,  the  following  more  detailed  functional 
specifications  can  now  be  formulated. 

4.9.1  Module  Level  Built-In-Tests 

1.  Continuously  detect,  using  additional  test  hardware  logic,  faults 
within  the  module  when  the  module  is  on-line.  This  should  be  done 
by  partitioning  the  module  into  simpler  logical  subfunctions  and 
providing  a passive  test  logic  for  each  subfunction. 
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2.  Log  the  type  of  error  detected  in  error  status  register! s)  on  the 
module.  The  error  status  register(s)  should  be  accessible  to  the 
higher  level  BITs.  Furthermore,  error  status  register  once  set 
should  only  be  cleared  by  either  the  system  reset  signal  or  by 
command  from  the  higher  level  BITs. 

3.  Be  capable  of  reporting  the  fault  condition  via  user  programmable 
hardware  interrupt  option  to  the 

(a)  System  Level  BIT  in  the  CPU  module 

(b)  Chassis  Level  BIT  Module. 

4.  Report  all  detected  faults  to  the  operator  via  non-volatile  indi- 
cators placed  on  the  module. 

5.  Be  capable  of  aiding  the  Chassis  and  System  Level  BITs  via  pro- 
granmable  hardware  logic  to  conduct  fault  detection  and  isolation 
tests  when  the  module  is  in  idle  state  or  off-line.  The  faults 
being  detected  may  lie 

(a)  within  the  module 

(b)  in  external  data  paths  (input/output  lines)  connected  to 
the  module 

(c)  in  other  modules. 

The  additional  hardware  logic  used  for  this  purpose  should  be  pro- 
grammed through  and  maintenance  register(s)  or  maintenance  bits  in 
error  status  register(s)  specially  provided  for  this  purpose, 
b.  Measure  and  record  separately  on  each  module  in  non-volatile  form 
the  accumulated  elapsed  time  for  which  the  power  has  been  applied 
to  that  module. 

4.9.2  Chassis  Level  Built-In-Tests 

1.  Continuously  monitor  all  faults  being  reported  by  lower  level 
BITs. 

(a)  For  the  Expansion  Chassis  Level  BIT  the  lower  levels  are  the 
BITs  on  the  modules  within  the  chassis. 

(b)  For  the  Main  Computer  Chassis  Level  BIT  the  lower  levels  are 
the  BITs  on  the  modules  within  the  Main  Computer  Chassis  and 
the  Expansion  Chassis  Level  BIT  Modules. 


2.  Log  the  type  of  fault  reported  and  the  identity  of  the  module  re- 
porting it  in  a non-volatile  store  on  the  Chassis  Level  BIT  Mod- 
ule. This  may  be  done  by  reading  the  error  status  register(s)  of 
the  faulty  module. 

3.  Be  capable  of  characterizing  the  reported  faults  into  at  least  re- 
petitive or  non- repetitive  classes  (stuck-at  or  transient  faults) 
by  counting  the  number  of  occurrences.  This  can  be  done  by  analy- 
zing the  fault  log  on  the  Chassis  Level  BIT  Module. 

4.  Be  capable  of  reporting  the  fault  conditions  via  user  programmable 
hardware  interrupt  option  to  the  higher  level  BITs. 

(a)  For  the  Expansion  Chassis  Level  BIT  the  higher  levels  are  the 
Main  Computer  Chassis  Level  BIT  Module  and  the  System  Level 
BIT  in  the  CPU  module. 

(b)  For  the  Main  Computer  Chassis  Level  BIT  the  higher  level  is 
the  System  Level  BIT  in  the  CPU  module. 

5.  Report  all  faults  or  fault  signature  to  the  operator  via  indica- 
tors, or  numeric  displays  or  small  alpha-numeric  printer  on  the 
Chassis  Maintenance  Panel. 

6.  Be  capable  of  executing  a selected  test  or  tests  (resident  in  non- 
volatile memory  on  the  Chassis  Level  BIT  Module)  to  detect  and 
Isolate  hardware  faults  In  all  modules  within  the  chassis  if  com- 
manded by  the  operator  via  the  Chassis  Maintenance  Panel  when  the 
chassis  is  In  stand-alone  mode. 

7.  Be  capable  of  executing  idle  time  test  or  tests  (resident  in  non- 
volatile memory  on  the  Chassis  Level  BIT  Module)  to  detect  and 
isolate  hardware  faults  on  all  modules  within  the  chassis  when 
command  by  the  System  Level  BIT  via  the  CPU. 

8.  Measure  and  record  separately  on  each  chassis  in  non-volatile  form 
the  accumulated  elapsed  time  for  which  power  has  been  applied  to 
that  chassis. 

4.9.3  System  Level  Built-In-Tests 

1.  Continuously  monitor  all  faults  being  reported  by  lower  level 
BITs,  l.e.,  all  Module  Level  BITs  within  the  system,  and  the 
Expansion  Chassis  and  Main  Computer  Chassis  Level  BIT  Modules. 
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2.  Log  the  type  of  fault  reported  and  the  identity  of  the  source  re- 
porting it  in  a non-volatile  store  in  the  main  memory  or  a file  on 
a mass  storage  device. 


3.  Be  capable  of  characterizing  the  reported  faults  into  at  least  re- 
petitive or  non-repetiti ve  classes  (stuck-at  or  transient)  by 
counting  the  number  of  occurrences.  This  can  be  done  by  analyzing 
the  fault  log  mentioned  in  Item  2. 

4.  Be  capable  of  reporting  fault  conditions  (via  user  programmable 
software  option)  to  the  operator  in  one  or  more  of  the  following 
ways. 

(a)  Indicators  on  the  System  Console  Panel 

(b)  Indicators  on  the  Main  Computer  Chassis  Maintenance  Panel 

(c)  Printed  message  on  any  system  output  device. 

5.  Be  capable  of  executing  a selected  test  or  tests,  resident  in  ron- 
volatile  system  memory,  to  detect  and  isolate  faults  in  all  mod- 
ules within  the  system  when  commanded  by  the  operator  via  the  Sys- 
tem Console  Panel . 

6.  Be  capable  of  executing  test  or  tests,  resident  in  non-volatile 
system  memory,  to  detect  and  isolate  faults  in  all  modules  within 
the  system  when  initiated  by  the  user  via  software  instructions. 

7.  Be  capable  of  executing  a selected  test  or  tests  (resident  in  non- 
volatile micro-memory  of  the  CPU  module)  to  detect  and  isolate 
faults  in  the  CPU  module  when  initiated  in  one  of  the  following 
ways: 

(a)  by  the  user  via  a software  instruction 

(b)  by  the  operator  via  the  System  Console  Panel 

(c)  by  the  Main  Computer  Chassis  Level  BIT  module. 

4.10  Summary  of  the  Built-In-Test  Functions 

The  built-in-test  features  discussed  in  this  section  for  the  MCF  com 
puter  systems  are  summarized  in  Table  4.2.  The  seven  major  functions  dis 
cussed  in  this  section  are  listed  in  the  left-most  column  of  Table  4.2. 

The  responsibilities  of  the  Module,  Chassis,  and  System  Level  BITs  during 
on-line,  idle  time,  and  off-line  testing  are  shown  in  adjacent  columns. 


TABLE  4.2.  SUMMARY  OF  THE  BUILT-IN-TEST  AND  FUNCTIONS  FOR  THE 
MCF  MODULE.  CHASSIS  AND  SYSTEM  LEVEL  BITS. 


Functions 


1.  Detection 

2.  Isolation 

3.  Indication  to 
Operator 

4.  Reporting  to 
Higher  Level  BIT 

5.  Logging  in  Local 
Storage 

6.  Characterization 
Stuck-at/Transient 

7.  Error  Handling 


Test 

Domain 


f All  Hardware  I All  Modules  I All  Modules 


| Functions  Within  Within  A I Within  A 


j A Module  Chassis  I System 
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5.U  BUILT-IN-TEST  RESOURCE  ALLOCATION  BASED  ON  FAILURE  RATE  ANALYSES 


It  is  important  in  the  design  of  maintainable  digital  computer  systems 
to  know  which  subsystems  are  most  likely  to  fail.  .To  do  this  precisely 
requires  detailed  knowledge  of  the  particular  circuits  to  be  used  In  the 
synthesis  of  various  parts  of  the  computer  system.  In  cases  where  the 
exact  hardware  embodiment  is  not  known,  it  is  reasonable  to  extrapolate 
from  failure  rates  predicted  for  closely  related  machines. 

A first  cousin  of  the  Military  Computer  Family  member,  AN/UYK-41,  Is 
the  Digital  Equipment  Corporation's  PDP-11/70.  The  DEC  PDP-11/70  is  there- 
fore used  in  the  following  analysis  to  make  inferences  about  the  functional 
areas  within  the  AN/UYK-41  which  are  most  likely  to  fail.  In  addition, 
this  same  reasoning  can  be  used  to  predict  the  failure  rate  Increase  for 
representati ve  BIT  approaches.  The  following  sections  summarize  the 
PDP-11/70  analysis. 


5 . 1 Objectives  of  the  Failure  Rate  Analyses 

The  two  major  objectives  of  the  failure  rate  analysis  were:  1)  to 
identify  specific  areas  of  the  computer  system  that  are  most  prone  to  fail 
ure,  and  2)  to  predict  the  impact  of  BIT  on  the  total  system's  failure 
rate.  The  basic  premise  here  is:  once  the  specific  areas  (modules,  sub- 
systems) that  have  high  failure  rates  are  found,  these  areas  can  be  given 
the  emphasis  in  the  allocation  of  BIT  resources.  In  this  manner,  the 
smallest  amount  of  BIT  hardware  will  detect  the  greatest  number  of  errors. 
Another  result  of  this  analysis  is  the  identificaiton  of  certain  modules 
that  have  a failure  rate  so  high  that  the  use  of  error  correcting  hardware 
may  be  included  in  the  design  for  meeting  the  MTBF  specification.  The 
failure  rate  model  used  in  this  analysis  is  that  of  MIL-HDBK-217B.  For  a 
more  complete  description  of  the  model  or  definitions  of  various  parame- 
ters, refer  to  that  handbook. 

5.2  Results  of  DEC  PDP-11/70  Failure  Rate  Analysis 

Using  a computer  program  developed  at  Carnegi e-Mel  Ion  University 
(CMU)  called  Autofail,  it  is  possible  to  compute  the  failure  rate  of  each 
board  in  the  PDP-11/70  mainframe.  While  the  11/70  is  not  a military 
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computer,  it  represents  a commercial  machine  with  similar  performance 
specifications  to  the  MCF  machines.  The  CMU  program  utilizes  the  elec- 
tronic components  failure  rate  model  in  MII-HDBK-217B.  The  computer  pro- 
gram does  not  use  the  exact  modifications  that  are  included  in  the  two 
revisions  to  217B,  rather  it  uses  a slightly  different  modification  that 
approximates  it. 

Tables  5.1  and  5.2  represent  a summary  of  this  analysis.  The  computer 
was  divided  into  the  four  units:  CPU,  floating  point  processor,  cache,  and 
main  memory.  For  each  of  these  units,  the  number  of  boards,  the  failure 
rate,  and  the  percentage  of  total  failure  rate  is  given.  Table  5.1  gives 
this  information  for  the  computer  modules  at  an  ambient  temperature  of 
25°C,  whereas  Table  5.2  gives  this  same  information  at  85°C.  The  latter 
temperature  was  chosen  because  it  is  the  specification  in  the  ITEC  docu- 
ments [14]. 

One  can  easily  see  that  memory  failures  dominate  (92%)  all  others  at 
high  temperatures  and  are  still  a large  percentage  (50%)  even  at  room  tem- 
perature. The  great  influence  that  temperature  has  on  the  relative  fail- 
ure rate  is  discussed  in  Section  5.3.  The  next  largest  unit  that  is  prone 
to  failure  is  the  CPU.  Its  failure  rate  is  about  twice  that  of  the  other 
two  units.  The  numbers  in  Tables  5.1  and  5.2  represent  a computer  with  no 
built-in-test  so  the  numbers  are  not  influenced  by  additional  BIT  hardware. 
In  order  to  provide  the  best  fault  coverage  at  the  lowest  cost,  the  two 
areas  that  deserve  the  most  consideration  are  the  main  memory  and  the  CPU. 
By  providing  a high  level  of  confidence  in  the  proper  operation  of  these 
two  vital  areas  (through  the  use  of  BIT  hardware)  a high  level  of  confi- 
dence in  the  operation  of  the  computer  is  achieved.  In  fact,  if  these  two 
vital  areas  are  functioning  properly,  they  can  be  used  to  check  the  other 
two  areas;  the  cache  memory,  and  the  floating  point  processor.  For  more 
detailed  Information  on  the  boards  and  components  in  each  of  the  segments 
of  the  computer,  a complete  listing  is  Included  in  Appendix  B. 

5.3  Comparison  of  Failure  Rates  Between  MOS  and  Bipolar  Technologies 

There  Is  a large  difference  between  semiconductor  technologies  with 
respect  to  the  effect  of  temperature  on  the  failure  rate.  This  fact  was 


64 


TABLE  5.1.  FAILURE  RATE  OF  PDP-11/70  at  25#C 


. 

Subsystem 

Number  of 
P.C.  Boards 

Central  Processing  Unit 

9 

Floating  Point  Processor 

4 

Cache  Memory  (IK  by  16  bits) 

4 

Main  Memory  (64K  by  16  bits) 

4 

Total 

25 

TABLE  5.2.  FAILURE  RATE  OF  PD.">-ll/70  at  85°C 


Subsystem 

Number  of 
P.C.  Boards 

Failure  Rate 
( / 10*  Hr.) 

Percentage  of 
Total  Failure 
Rate  (X) 

Central  Processing  Unit 

9 

383 

4 

Floating  Point  Processor 

4 

263 

2 

Cache  Memory  (IK  by  16  bits) 

4 

199 

2 

Main  Memory  (64K  by  16  bits) 

4 

9910 

92 

Total 

25 

10755 

100 
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shown  In  Section  5.2  where  the  memory  portion  of  the  computer  accounted  for 
nearly  all  the  failure  rate  (92%)  at  high  temperature  (85°C),  but  not  at 
lower  temperature.  The  memory  boards  are  made  up  of  mostly  MOS  chips  that 
have  a large  failure  rate  acceleration  with  temperature.  In  contrast,  the 
CPU  boards  are  composed  mainly  of  SSI  A MSI  chips  whose  failure  rate  has  a 
much  lower  temperature  dependence. 

From  an  examination  of  the  reliability  model  of  MIL-HDBK-217B,  one 
notices  that  bipolar  technology  has  one  temperature  acceleration  function 
whereas  MOS  technology  has  a different  (larger)  one.  Figure  5.1  shows 
these  two  temperature  factors  ( tt^) . It  is  clear  from  this  figure  that  the 
reliability  of  MOS  devices  degrades  substantially  at  elevated  temperatures. 
This  means  that  a significant  failure  rate  reduction  can  be  achieved  simply 
by  keeping  the  ambient  temperature  a few  degrees  cooler! 

To  Illustrate  the  impact  that  technology  and  temperature  makes  on  the 
reliability  Figure  5.2  graphically  represents  the  failure  rate  as  a func- 
tion of  temperature  for  the  PDP-11/70.  The  components  of  this  computer  are 
classified  into  one  of  two  groups:  1)  ROM  and  RAM,  or  2)  SSI  and  MSI.  The 
ROM  and  RAM  group  Is  predominately  MOS  and  is  made  up  of  472  chips.  The 
SSl/MSl  groups  Is  predominately  bipolar  and  is  composed  of  1870  chips.  The 
larger  number  of  MSI /SSI  chips  have  about  the  same  failure  rate  at  room 
temperature  as  the  memory  because  they  are  largely  simple  functions  that 
each  have  a low  failure  rate.  The  memory  chips  are  much  more  complex  (i.e. 
more  gates/chip)  and,  therefore,  have  a higher  failure  rate  per  chip.  How- 
ever, an  Interesting  situation  arises  when  the  temperatures  of  each  are  in- 
creased. The  memory's  failure  rate  increases  dramatically  while  the  fail- 
ure rate  of  the  SSI/MSI  group  increases  only  slightly.  This  illustrates 
the  fact  that  at  elevated  temperatures  the  memory  portion  of  a computer 
will  account  for  nearly  all  the  hardware  failures. 

5.4  Failure  Distributions  of  Other  Computers 

To  get  a somewhat  broader  view  of  reliabilities  from  a variety  of 
computers,  the  following  Information  is  given.  This  reliability  data  was 
obtained  from  Carnegle-Mel Ion  University  from  their  continuing  research  on 
computer  reliability.  The  data  was  derived  from  the  model  in  Mll-HDBK-217 
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Figure  5.1  Curves  or  Temperature  Factors 


for  use  in  commercial  environments.  To  summarize  the  data  in  the  following 
table,  one  can  estimate  that  in  a computer  system  (with  peripherals  not  in- 
cluded) 60%  of  the  failures  will  be  in  the  memory,  30%  will  be  in  the  CPU 
and  10%  will  be  in  the  power  supply. 

The  computer  systems  that  were  examined  include  a PDP-10  with  a 256K 
memory,  a PDP-11/40  with  28K  memory,  an  LSI-11  with  28K  memory  and  two 
multiprocessors  at  CMU  the  C.mmp  and  the  Cm*.  These  computers  represent  a 
broad  range  of  computer  applications.  The  PDP-10  is  an  example  of  a large 
time-sharing  machine,  the  PDP-11/40  is  an  example  of  a typical  min'comput- 
er,  and  the  LSI-11  represents  a microcomputer.  The  C.mmp  has  16  intercon- 
nected PDP-11/40  processors  and  a total  core  memory  capacity  of  one  million 
words.  The  Cm*  multiprocessor  has  eight  LSI-11  processors  each  with  28K 
words  of  semiconductor  memory.  The  individual  data  is  shown  in  Table  5.3. 
[22] 

5.5  Conclusion  and  Recommendations  for  BIT  Resource  Allocation 

The  area  that  is  most  likely  to  fail  is  the  logical  place  for  built- 
in-test  capability.  It  has  been  shown  that  at  high  temperatures  memory 
module  reliability  degrades  to  a great  extent  and  dominates  the  failure 
rate  of  all  other  areas.  It  is,  therefore,  logical  to  provide  error- 
correcting  hardware  for  the  memory  modules  to  increase  the  MTBF  of  the 
entire  computer  system.  Error-correcting  hardware  is  not  BIT  in  the 
strictest  sense,  but  error-correction  directly  affects  maintainability  and 
maintainability  is  what  BIT  is  all  about.  A complete  discussion  of  the 
recommended  BIT  for  memory  is  given  in  Section  6.1. 

The  area  that  is  next  most  prone  to  failure  is  the  CPU.  The  organiza- 
tion and  function  of  the  CPU  is  not  simple,  which  implies  that  the  BIT 
techniques  will  not  be  simple.  The  CPU  performs  many  functions  which  re- 
quire a variety  of  BIT  techniques  to  provide  a BIT  capability  that  is  both 
inexpensive  and  effective.  The  complete  recommendation  of  BIT  techniques 
for  the  CPU  is  given  in  Section  6.2. 
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TABLE  b.3.  FAILURE  RATES  FUR  VARIOUS  COMPUTER  SYSTEMS 


Failure  Rate 

l Of 

System 

/10  Hours 

Total  1 

PDP-1U 

Processor 

3156 

27 

Memory  (256K) 

6658 

58 

RP-10  Disk  Controller 

625 

5 

Two  DF-10  Data  Channels 

1135 

10 

Total 

TT55U 

100 

PDP-11/40 

Processor 

57 

30 

Memory  (28K) 

108 

57 

Power  Supply 

25 

13 

Total 

190 

100 

LSI-11 

Processor 

67 

27 

Memory  (28K) 

154 

63 

Power  Supply 

25 

10 

Total 

246 

100 

C.mmp 

Processors 

1008 

18 

Memory  (100UK) 

3904 

65 

Switch 

202 

3 

Power  Supply 

800 

13 

Total 

5994 

95 

Cm* 

Processors  and  32  Memory 

880 

33 

Memory  (192K) 

896 

33 

Other 

656 

24 

Power  Supplies 

250 

_9 

All  of  Memory 

1392 

Total 

2682 

99 
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5.6  Reliability  Impact  of  Chassis  Level  BIT  Module 

In  an  attempt  to  quantify  the  impact  on  reliability  of  an  added  BIT 
module,  an  example  of  such  a configuration  has  been  constructed.  The  model 
uses  a PDP-11/70  minicomputer  to  represent  an  MCF  chassis,  and  to  this  is 
added  an  LSI - 11  microcomputer  to  represent  a BIT  module.  The  LSI-11  Is 
easily  capable  of  performing  all  the  error  detection,  error  handling,  and 
error  reporting  tasks  that  have  been  discussed  previously.  The  PDP-11/70  is 
closely  related  in  performance  to  a single  chassis  configuration  of  an  MCF 
computer. 

Using  the  reliability  model  in  MIL-HDBK-217B,  the  failure  rate  of  each 
part  of  this  computer  system  was  determined.  To  perform  these  calculations, 
a computer  program  at  Carnegie-Mel Ion  University  was  used.  The  printouts  of 
this  analysis  of  a PDP-11/70  with  a chassis  (console)  BIT  module  Is  included 
in  Appendix  B.  To  summarize  the  findings  from  this  analysis,  the  basic 
result  is  that  the  failure  rate  of  the  BIT  module  (the  LSI-11)  is  only  five 
percent  of  the  total  failure  rate.  The  failure  rate  calculation  was  re- 
peated at  several  temperatures  between  25°C  and  85°C.  The  result  indicated 
the  relative  BIT  module  failure  rate  changed  only  slightly  in  relation  to 
the  whole  chassis.  Another  interesting  fact  that  resulted  from  this  analy- 
sis is  that  roughly  half  of  the  failure  rate  of  both  the  PDP-11/70  and  the 
LSI-11  was  from  the  memory.  The  11/70  has  a 64K  word  memory  and  the  LSI-11 
has  a ak  word  memory.  At  elevated  temperatures  the  failure  rate  of  the  mem- 
ory !r, creased  more  rapidly  than  the  rest  of  the  components,  but  the  BIT  mod- 
ule with  its  memory  increased  at  approximately  the  same  rate  as  the  11/70. 

This  analysis  has  shown  that  it  is  possible  to  provide  an  effective  BIT 
module  to  an  MCF  computer  chassis  at  a cost  to  the  chassis  failure  rate  of 
only  5%.  The  single  circuit  board  sized  microcomputer  that  served  as  an 
example  of  BIT  module  could  easily  provide  the  required  built-in-test  func- 
tion needed  in  each  chassis. 


6.0  RECOMMENDATIONS  FOR  THE  MODULE  LEVEL  BUILT- IN-TEST  FOR  THE  MCF 
AN/UYK-4KV)  COMPUTER  SYSTEM 

In  this  section,  built-in-test  approaches  for  the  following  types  of 
modules  have  been  considered: 

1.  Memory  Subsystem  Modules 

2.  Central  Processing  Unit  Module 

3.  Bus  Modules 

4.  Input/Output  Modules 

5.  Analog  Modules 


It  should  be  recalled  from  Section  5.0  that  a large  percentage  of  the 
system  malfunctions  are  due  to  failures  in  the  memory.  The  failures  in  the 
CPU  and  the  power  converters  constitute  most  of  the  remaining  failures. 
Thus,  it  would  be  most  advantageous  to  concentrate  the  BIT  resources  in 
these  modules. 

The  Memory  Subsystem  consists  of  32K  x 36  bit  volatile  or  non-volatile 
random  access  memory  modules.  The  memory  modules  include  their  own  read/ 
write  control  circuitry.  The  number  and  type  of  modules  used  in  a system 
would  depend  on  the  memory  requirements  of  the  application.  For  the  memory 
modules  simple  parity  and  single  error  correcting,  double  error  detecting 
codes  are  discussed. 

The  Central  Processing  Unit  consists  of  one  CPU3  module  for  single 
processor  system  and  two  CPU3  modules  for  a dual  processor  system.  The 
CPU3  module  has  been  designed  to  operate  in  both  modes  and  its  prime  func- 
tion Is  to  emulate  the  AN/UYK-41(V)  (PDP-11/70)  instruction  set.  The  Bus 
modules  consist  of  the  Bus  Extender  Module  (BEM)  and  the  Bus  Interface 
Module  (BIM2).  The  BEM  is  used  in  multiple  chassis  configuration  to  extend 
the  MCF  bus  system  for  interchassis  communications.  The  B1M2  has  been  de- 
signed to  convert  the  MCF  bus  to  the  AN/UYK-41(V)  bus  (UNIBUS)  so  that  non- 
MCF  peripheral  compatible  with  the  UNIBUS  may  be  used  on  the  MCF  computer 
systems. 

The  Input/Output  (I/O)  modules  are  used  to  interface  the  MCF  computer 
system  with  external  devices.  The  ITEK  documentation  does  not  as  yet 


. r' . 

•*  : r ' *•' 


provide  detailed  specifications  for  the  1/0  modules.  However,  it  Is  as- 
sumed that  there  will  be  several  different  types  of  I/O  modules  depending 
on  the  interfacing  requirements  of  the  external  devices. 

The  CPU3  and  the  BIM2  are  among  the  most  complex  modules  In  the  MCF 
computer  system.  They  perform  several  Interdependent  functions.  Most  of 
these  functions  require  generations  of  critical  timing  and  control  signals. 
Also,  all  of  the  above  mentioned  modules  perform,  as  a part  of  their  over- 
all functions,  interactive  communication  among  themselves  as  well  as  other 
MCF  modules.  Due  to  this  variety,  complexity  and  interdependence  of  the 
functions  in  these  modules,  no  single  BIT  approach  can  be  used  to  provide  a 
substantial  fault  coverage.  Rather,  a set  of  BIT  approaches  must  be  used. 

The  analog  modules  in  the  MCF  computer  system  are  the  Power  Converter 
Modules  (PCM)  and  the  Fan  Assembly.  The  built-in-tests  for  the  analog  mod- 
ules require  a different  approach.  For  the  analog  modules,  the  types  of 
parameters  to  be  monitored,  such  as  the  output  voltages,  environmental,  and 
thermal  and  mechanical  parameters,  are  discussed.  Alternative  ways  to 
implement  the  BIT  hardware  on  the  analog  modules  is  also  presented. 

For  the  MCF  computer  systems,  three  levels  of  built-in- tests  (system, 
chassis,  and  module  levels)  have  been  discussed  in  Section  4.0.  Although 
these  three  levels  of  built-in-test  have  their  own  respective  responsibili- 
ties, there  is  some  degree  of  interaction  among  them,  and  as  such  they  can- 
not be  treated  as  three  entirely  different  approaches. 

The  BIT  approach  discussed  here  for  the  above  mentioned  modules  per- 
tains mainly  to  the  module  level  built-ln-tests.  The  module  level  BIT 
consists  mainly  of  those  testing  techniques  that  require  additional  BIT 
circuitry  (hardware,  firmware)  resident  on  the  modules.  The  main  purpose 
of  this  additional  BIT  circuitry  is  to  monitor  (detect)  and  to  aid  in  diag- 
nosing (isolating)  hardware  faults.  These  hardware  faults  may  be  within 
the  module  or  in  the  intermodule  communications  paths  via  the  system  bus. 
These  fault  detection  and  isolation  functions  of  the  module  level  BIT  are 
discussed  in  this  section. 

In  addition  to  the  fault  detection  and  isolation  functions,  the  re- 
sponsibility of  module  level  BIT  include  fault  indication,  logging,  and 


73 


reporting.  These  aspects  of  module  level  BIT  have  not  been  considered  here 
at  this  time. 

At  the  module  level,  the  overall  approach  for  analyzing  the  BIT  re- 
quirements Is  as  follows: 

1.  Partition  the  hardware  on  the  module  by  functions.  These  func- 
tions should  be  as  loosely  coupled  as  possible.  Tightly  coupled 
(or  interdependent)  functions  make  fault  detection  and  isolation 
more  difficult. 

2.  Begin  with  the  most  basic  (lowest  level,  innermost)  function. 

(a)  Determine  techniques  to  test  that  function  with  adequate 
fault  coverage  making  maximum  use  of  internal  observability 
of  signals. 

(b)  If  2(a)  is  not  possible  or  fault  coverage  is  inadequate, 
determine  ways  of  providing  functional  duplication  for  the 
whole  or  a part  of  the  function  as  a means  for  testing. 

(c)  If  either  2(a)/2(b)  are  not  feasible  or  external  testing  of 
the  module  functions  is  desirable,  provide  externally  acces- 
sible test  mechanism.  This  may  vary  from  simple  test  points 
to  read/write  maintenance  registers.  The  purpose  of  this  is 
to  Increase  the  observability  and  controllability  of  internal 
signals. 

3.  Repeat  Step  2 with  increasing  higher  level  functions  (those  depen- 
dent on  the  basic  functions  for  their  operation).  This  way  maxi- 
mizes the  use  of  the  building  block  approach  to  testing.  The  more 
basic  functions  can,  if  working  properly,  be  used  for  testing 
higher  level  functions. 

In  view  of  the  above  mentioned  approach  for  analyzing  the  BIT  require- 
ments at  the  module  level  and  the  overall  MCF  BIT  requirements  discussed  in 
earlier  sections,  the  built-in-test  features  for  the  memory,  CPU3,  BEM, 
8IM2,  I/O  and  analog  modules  are  described  in  the  following  sections. 
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6 . 1 Memory  Modules 

Within  the  class  of  memory  modules  there  are  volatile  and  non-volatile 
types.  The  volatile  type  using  semiconductor  random  access  memories  will 
be  discussed  first.  Before  the  various  approaches  are  discussed  some  back- 
ground information  on  reliability  assumptions  will  be  given. 

6.1.1  Volatile  RAM 

A block  diagram  of  a memory  module  with  error  correction  is  shown  in 
Figure  6.1.  It  identifies  the  parts  that  have  been  added  to  provide  the 
BIT  capability.  From  MIL-HDBK-217B  the  appropriate  failure  model  is  of  the 
form: 

R = e-*t 

where:  R is  the  reliability 

x is  the  failure  rate,  usually  expressed  in  failures  /106 
hours 

t is  time,  usually  expressed  in  hours 

For  a non-error-correcting  group  of  electronics,  the  system  reliability 
is  the  product  of  all  the  reliabilities  of  the  components.  This  is  more 
easily  computed  by  simply  adding  the  failure  rates  of  the  components.  How- 
ever, this  method  is  not  applicable  to  designs  that  have  at  least  some 
degree  of  fault  tolerance  built  into  them.  In  the  case  of  memories  built 
with  a single  error  correcting  code,  an  equation  of  the  following  form  Is 
correct: 

R = [ke-(k-1)xit-(k-l)e'kAit]  *e-xt 

where:  R is  the  reliability  of  the  system 

k is  total  number  of  bits  in  the  word 
x^  is  the  failure  rate  of  a memory  chip 
Ac  is  the  failure  rate  of  the  control  circuitry 
t is  time 

w is  the  number  of  sets  of  chips  in  the  system 
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Figure  6.1  Block  Diagram  of  Memory  Module  with  Error  Correction 
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The  parameter  w is  computed  by  dividing  the  number  of  bits  on  a chip  by  the 
number  of  words  on  the  memory  module.  A chip  organization  one  bit  wide  has 
been  assumed.  This  model  also  assumes  a very  pessimistic  view,  i.e.  that 
of  the  whole  chip  failure  model.  That  is,  every  failure  within  the  chip 
disables  the  entire  chip.  In  reality,  most  of  failures  within  a chip 
affect  only  one  cell.  With  the  above  model,  a failure  rate  (FR)  is  defined 
as  below: 

MTBF  = / R(t)dt 
*/0 


Based  on  these  assumptions,  the  suggested  BIT  approach  for  the  vola- 
tile memory  module  will  be  a single  error  correcting  code  with  double  error 
detection.  Although  the  added  BIT  circuitry  is  a greater  than  the  10-20% 
goal,  the  reliability  is  not  decreased  a similar  amount.  In  fact,  the 
reliability  will  be  significantly  increased.  The  exact  Increase  in  the 
number  of  packages  depends  on  the  size  of  the  memory  chips  used.  Larger 
chips  (16K.  dynamic  RAM)  are  generally  cheaper  per  bit,  more  reliable,  use 
less  power  and  take  up  less  board  space.  For  these  reasons,  large  chips 
should  be  used  as  soon  as  they  are  able  to  meet  military  quality  control 
specifications.  The  following  table  (TaDle  6.1)  illustrates  the  impact  on 
failure  rate  for  various  implementations.  It  is  based  on  a bare  bones 
memory  module  with  32K  words  each  having  16  bits  of  data.  Because  of 
control  circuitry  differences  in  the  final  MCF  implementation,  the  actual 
cost  and  benefit  numbers  may  be  slightly  different  than  those  listed  in 
Table  6.1.  The  recommended  BIT  will  store  an  additional  six  bits  with 
every  data  word. 
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TABLE  6.1. 

VOLATILE 

MEMORY  MODULE 

BIT  APPROACHES 

BIT 

Method 

Chip 

Size 

Chip 

Type 

Estimated 
Number  of 
Chips  in 
Module 

Module 

Failure  Rate 
(10  hours) 

Module 

MTBF 

(hours) 

Pari  ty 

4K 

static 

161 

489 

ECC 

4K 

static 

219 

139 

7170 

Parity 

16K 

dynamic 

61 

287 

3480 

ECC 

16K 

dynamic 

92 

141 

The  data  in  Table  6.1  shows  that  in  using  4K  static  RAM,  the  error  correct- 
ing code  (ECC)  requires  36%  more  chips  than  the  same  module  with  parity. 
However,  the  MTBF  has  increased  250%!  Using  16K  RAM's,  the  ECC  requires 
50%  more  chips  than  the  module  with  parity  (but  less  than  half  the  total 
number  using  4K  RAM's)  and  has  MTBF  100%  greater.  One  can  also  see  from 
this  data  that  with  ECC  the  module  reliabilities  using  4K  and  16K  RAM's  are 
nearly  the  same,  even  though  the  module  reliability  is  very  different  when 
parity  is  used.  This  is  because  the  failure  model  assumed  that  the  whole 
chip  was  Inoperable  when  a failure  occured.  Since  there  are  more  memory 
cells  In  a 16K  RAM,  a total  chip  failure  is  much  more  drastic.  If  a single 
cell  failure  model  is  used,  the  module  reliability  is  increased  by  over  an 
order  of  magnitude  and  the  reliability  of  the  module  with  16K  RAM's  is 
higher  than  one  built  with  4K  RAM's.  Using  the  error  correcting  code,  the 
error  detection  coverage  is  greater  than  95%. 

The  recommended  BIT  for  the  volatile  RAM  will  take  up  a relatively 
large  amount  of  board  space.  With  the  space  (and  power)  limitations  of  a 
RAM  module  from  the  MCF,  and  with  present  technology,  it  may  be  impossible 
to  Implement  the  recommended  BIT.  Recognizing  this  fact,  an  alternative 
BIT  technique  should  be  used  until  tecnology  produces  more  complex  integra- 
ted circuits.  The  alternative  BIT  technique  Is  byte  parity.  This  approach 
will  detect  over  95%  of  the  faults  at  a cost  of  only  about  12%. 
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6.1.2  Non-Volatile  RAM 

Non-volatile  memories  use  ferrite  cores  to  store  the  digital  data. 
Since  M1L-HDBK-217B  does  not  give  any  reliability  data  on  ferrite  cores,  it 
is  not  possible  to  do  an  analysis  to  quantify  the  reliability  impact  of  any 
BIT  technique.  However,  the  volatile  RAM  provides  the  same  function  as  the 
nonvolatile  RAM.  Therefore,  one  can  reasonably  assume  that  any  particular 
BIT  approach  would  increase  the  hardware  approximately  the  same  amount  on 
each  type  RAM.  Core  memories  have  become  a mature  technology  that  has  been 
developed  over  many  decades.  This  has  increased  its  reliability  and 
reduced  its  size  and  cost.  Indeed,  core  memories  are  more  reliable  than 
the  semiconductor  ones.  These  reasons  would  indicate  that  the  use  of  ECC 
would  be  a needless  expense. 

The  recommended  BIT  technique  for  the  non-volatile  memory  module  will 
be  byte  parity.  This  will  detect  over  95%  of  the  faults,  but  will  increase 
cost  only  by  about  12%.  However,  in  the  future,  it  may  be  desirable  to  use 
an  error-correction  technique  on  these  memories  also.  Applications  are  re- 
quiring larger  and  larger  memory  arrays  and  the  need  for  long  MTBF  times  Is 
increasing.  If  the  MCF  computers  are  going  to  use  memory  arrays  near  20UK 
words,  then  an  error-correcting  code  will  be  needed  on  the  non-volatile 
memory.  This  will  be  the  only  way  to  have  a large  system  that  will  achieve 
an  adequate  MTBF. 

6.2  CPII3  Module 

The  block  diagram  of  the  CPU3  module  and  the  Basic  Processor  as  de- 
fined in  the  ITEK  F3  specifications  EL-CP-2817-MCF  [23]  are  given  in 
the  Figures  6.2  and  6.3,  respectively.  From  these  block  diagrams  a natural 
partitioning  of  the  CPU3  module  is  as  follows. 
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Figure  6.2.  CPU  Block  Diagram  [23] 
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Figure  6.3.  Basic  Processor  Block  Diagram  [23] 


CPU3  Module: 

Basic  Processor  Logic 

1 Processor  Logic 

2 Control  Panel  Interface 

3 MCF  Bus  Controller 
Read  Only  Memory: 

4 256  x 36  ROM 

The  functional  specifications  on  the  above  blocks  as  given  in  the  ITEK 
documentation  are  general,  in  that  they  specify  the  external  interface 
requirements  of  each  block,  but  do  not  specify  their  internal  implementa- 
tion. From  the  point  of  view  of  studying  the  BIT  requirements  certain 
details  on  the  internal  implementation  are  necessary  in  order  to  further 
partition  the  blocks  to  a level  where  the  built-in-test  circuitry  can  be 
described.  For  this  reason,  certain  assumptions  have  been  made  regarding 
the  Implementation  of  these  blocks.  These  assumptions  are  mainly  concerned 
with  the  partitioning  of  the  processor  logic  block.  It  is  assumed  that  the 
processor  logic  Is  Implemented  around  a microprogrammable  control  unit. 

This  Is  a reasonable  assumption  since  most  of  the  newer  generation  proces- 
sors are  microprogrammable  including  the  CDC-480  and  PDP-11/70. 

The  block  diagram  In  Figure  6.4  describes  this  assumed  implementation 
of  the  CPU3  module.  Based  on  this  assumed  implementation,  a more  detailed 
functional  level  partitioning  of  the  CPU3  module  may  be  done  as  follows. 


CPU3  Module: 

Basic  Processor  Logic 

1 Processor  Logic 

1.1  Clock 

1.2  Microsequencer 

1.3  Control  Store 

1.4  Timing  and  Control  Section 

1.5  Internal  Data  Paths,  Registers,  Stacks,  Files. 

1.6  Arithmetic  Logic  Unit  (ALU) 

1.7  Control  and  Status  Registers 

2 Control  Panel  Interface 

3 MCF  Bus  Controller 
Read  Only  Memory: 

4 256  x 36  ROM 

The  Built-In-Test  features  for  the  above  mentioned  functions  are 
discussed  below. 

6.2.1  Processor  Logic 

6. 2. 1.1  Clock  - The  clock  is  the  most  basic  function.  Improper 
operation  of  multiphase  clocks  can  result  in  malfunction  symptom  that  may 
be  very  difficult  to  diagnose.  Two  alternative  switch  selectable  clock 
mechanisms  should  be  provided  in  addition  to  the  crystal  controlled  clock 
used  for  normal  operation. 

A.  A variable  frequency  maintenance  clock  which  has  dual  function. 

It  can  be  used  to  check  the  operation  of  the  regular  fixed  fre- 
quency, crystal  controlled  clock  by  substitution.  Furthermore, 

It  can  be  used  to  diagnose  other  marginal  timing  related  problems 
in  other  sections  of  the  CPU  logic  by  varying  the  frequency. 

B.  A stepper  clock  to  allow  single  microinstruction  cycle  or  single 
machine  Instruction  cycle  under  manual  control  from  the  control 
panel.  This  feature  has  multiple  usage  both  in  maintenance  and 
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software  development.  Its  usefulness  is  limited  only  by  the  user 
accessability  of  the  internal  registers  after  each  step. 

6. 2. 1. 2 Microsequencer  - The  microsequencer  typically  consists  of 
microinstruction  register  (uIR),  microprogram  counter  (uPC),  microaddress 
register  (uADR),  microstack  registers  ( wStack ) condition  code  multiplexer, 
and  next  address  generation  logic.  This  section  forms  the  heart  of  the 
base  machine  control  and  its  failures  are  also  very  hard  to  diagnose.  The 
following  built-in-test  features  are  recommended. 

A.  Parity  on  all  internal  registers  if  feasible. 

B.  Additional  circuitry  should  be  designed  into  the  microsequencer 
logic  to  support  microstep,  microbreak,  microrepeat,  microaddress 
set  up  from  the  system  control  panel.  These  features  would  facil- 
itate the  troubleshooting  of  the  microsequencer  logic  by  stepping 
through  or  looping  on  certain  sections  of  the  microcode. 

C.  The  microop-code  field  in  the  ulR  should  have  one  or  two  illegal 
op-codes  that  cause  microtraps  which  freeze  the  contents  of  micro- 
PC  and  suspend  further  control  signal  generation  to  the  base 
machine.  This  can  be  used  for  microinstruction  retry  or  branching 
to  microdiagnostic  routines.  Errors  from  a parity  check  on  the 
control  store  should  also  cause  a microtrap. 

6. 2. 1.3  Control  Store  - Control  store  is  typically  a high  speed  ROM. 
The  number  of  microwords  and  number  of  bit/microword,  of  the  control  store 
depends  on  the  complexity  of  the  base  machine  and  the  microprogramming 
technique.  Horizontally  microprogrammed  machines  utilize  fewer  words  which 
have  longer  number  of  bits.  Vertically  microprogrammed  machines  on  the 
other  hand  have  more  words  with  less  number  of  bits  per  word.  In  either 
case,  20, 000-50, uOO  bits  of  storage  is  fairly  typical.  Significant  portion 
of  the  CPU  failure  can  be  attributed  to  failure  in  the  control  store. 
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Due  to  large  number  of  bits  per  word  and  the  higher  cost  of  high  speed 
ROMs,  error  correction  may  not  be  cost  effective.  Speed  consideration  may 
not  permit  single  bit  parity  on  an  entire  word  because  of  the  propagation 
delays  due  to  multiple  stages  in  the  parity  generation  and  checking  logic. 
Preferred  approach  would  be  to  segment  the  microword  in  several  fields  or 
8 to  10  bits  and  have  a parity  bit  associated  with  each  field.  This  would 
increase  the  control  store  size  by  10  to  12.5%. 

Block  code  correction  is  another  possibility.  However,  this  requires 
a substantial  Increase  in  the  BIT  circuitry,  which  is  cost  effective  only 
if  Isolation  of  the  failure  to  the  bad  bit  (or  chip)  and  subsequent  error 
correction  using  retry  mechanism  are  of  Importance. 

In  addition,  the  size  of  the  control  store  should  be  increased  to  pro- 
vide space  for  storing  microdiagnostic  routines.  It  should  be  possible  to 
execute  these  microdiagnostic  In  several  Independent  ways.  1)  From  user 
written  programs  by  either  calls  or  jumps  to  the  microdiagnostic  routines 
or  the  execution  of  certain  maintenance  instructions.  This  would  allow 
periodic  or  Idle  time  testing.  2)  From  the  operator  control  panel  through 
microaddress  load  and  execute  switches.  This  would  allow  bypassing  the 
main  memory  In  case  It  Is  malfunctioning.  3)  Directly  from  the  CPU  module 
with  a switch  selectable  hardware  jump  to  the  start  of  the  microdiagnostic 
routine.  (A  similar  feature  is  used  in  the  PDP-11/60  diagnostic  control 
store  option.)  This  feature  allows  bypassing  the  control  panel  in  case  it 
Is  malfunctioning. 

6. 2. 1.4  Timing  and  Control  Section  - The  timing  and  control  signals 
(levels  or  pulses)  are  typically  generated  by  judiciously  combining  the 
contents  of  the  microword  register  with  the  various  phases  of  the  clock 
pulses.  Basically,  a section  consists  of  and/or  logic  which  may  be  dis- 
tributed across  the  entire  CPU  module.  For  this  reason  functional  testing 
of  this  logic  Is  very  difficult. 

Duplication  of  this  section  of  logic  Is  a possibility.  However,  this 
approach  may  not  be  cost  effective.  Preferred  approach  is  to  associate 
this  portion  of  the  CPU  logic  with  the  operation  of  the  base  machine.  The 


timing  and  control  signals  generated  by  this  section  result  in  operations 
such  as  load,  shift,  rotate  the  base  machine  registers,  enable/disable  data 
path  multiplexers,  and  select  ALU  operations,  etc.  Emphasis  should,  there- 
fore, be  placed  on  verifying  the  base  machine  operations. 

Another  possibility  is  to  provide  as  many  test  points  as  possible  for 
measurements  of  the  timing  and  control  signals  using  external  test  equip- 
ment. 


6. 2. 1.5  Internal  Data  Paths,  Registers,  Stacks,  Files  - This  section 
forms  the  base  machine  which  interacts  with  all  other  modules  via  the  MCF 
bus.  It  fetches  instructions  and  data  from  the  memory,  processes  them  and 
outputs  the  results.  During  this  process,  the  addresses,  instructions,  and 
data  are  stored  in  intermediate  registers  such  as  the  program  counter  (PC), 
instruction  register  (IR),  bus  address  register  (BA),  general  purpose 
registers  (GR)  and  bus  registers  (BR),  etc. 

The  preferred  approach  here  is  to  provide  parity  on  all  such  Internal 
registers  where  it  is  possible. 

The  registers  in  the  AN/UYK-41 ( V ) would  normally  be  8,  16  or  32  bits 
wide.  A parity  bit  may  be  either  associated  with  an  8 bit  byte  or  a 16  bit 
word.  The  former  with  odd  parity  is  recommended  in  order  to  remain  com- 
patible with  the  four  bit  parity  byte  provided  for  the  32  bit  address  and 
data  information  on  the  MCF  M(X)  buses.  Typically,  most  of  the  data  regis- 
ters are  fed  through  the  ALU  even  for  simple  instructions  such  as  the 
"move"  instruction.  In  such  a case,  a parity  check  before  the  ALU  and  a 
parity  generation  after  the  ALU  would  be  the  most  cost  effective  to  imple- 
ment the  parity  tests. 

Parity  bits  (four)  must,  of  course,  be  generated  and  checked  for  the 
M ( X ) bus  address  and  data  transfers  as  per  the  ITEK  MCF  bus  specifications 
EL-CG-2808-MCF,  [24]. 

6. 2. 1.6  Arithmetic  Logic  Unit  - ALUs  are  typically  implemented  with 
LSI  chips  which  perform  a wide  range  of  functions.  Therefore,  it  is  not 
possible  to  further  partition  the  ALU  functions  for  built-in-test  purposes. 


* v 


A residue  arithmetic  coding  technique  may  be  used  to  check  most  of  the 
ALU  functions.  Residue  codes  implemented  as  separate  codes  do  not  inter- 
fere with  normal  ALU  operation.  The  residue  generators  operate  indepen- 
dently on  the  two  inputs  and  the  output  of  the  ALU, as  shown  in  Figure  6.S 
Another  alternative  is  to  duplicate  the  ALU  logic  and  detect  failures 
by  comparison.  This  obviously  will  result  in  more  than  1U0%  increase  in 
hardware  cost. 

A third  alternative  is  to  check  the  ALU  functions  using  periodic  micro 
or  macrodiagnostic  routines.  The  additional  cost  here  is  to  some  extent  in 
additional  firmware  (or  software),  but  more  significantly,  it  is  in  terms 
of  system  performance  degradation  due  to  the  CPU  time  lost  in  executing  the 
diagnostics. 


6. 2. 1.7  Control  and  Status  Registers  - The  contents  (bits)  of  these 
registers,  unlike  that  of  the  data  registers,  are  generally  neither  set 
simultaneously  nor  do  they  bear  any  relation  to  each  other.  This  typically 
precludes  the  use  of  parity  checking. 

An  alternative  is  to  duplicate  some  or  all  of  these  registers  and 
detect  faults  by  comparison.  Such  duplication  as  mentioned  before  results 
in  over  100%  Increase  in  the  hardware. 

The  preferred  approach  is  to  make  all  of  the  control  and  status  reg- 
isters accessible  (loadable  and  readable)  via  microinstruction,  macro 
(machine)  Instructions  and  also  via  the  control  panel.  This  will  facili- 
tate the  testing  of  these  registers  through  microdiagnostics,  macro- 
diagnostics and  also  manually.  This  same  approach  is  also  desirable  for 
the  Internal  data  registers  described  in  Section  6. 2. 1.6.  This  facility 
also  has  an  important  use  in  software  development  and  debugging. 

6.2.2  Control  Pane)  Interface  - The  main  function  of  the  control 
panel  is  to  allow  the  user  to  operate  the  machine.  A second  function  of 
the  control  panel  Is  to  provide  the  operator  a certain  degree  of  access  and 
control  of  the  internal  hardware  of  the  machine,  such  as  load  and  read 
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Figure  6.5  ALU  Checker 


internal  registers.  This  facilitates  to  a limited  extent  manual,  on-site 
fault  diagnosis  in  off-line  mode. 

A third  function  of  the  control  panel  is  to  aid  in  performing  off-line 
fault  diagnosis  from  remote  locations.  This  concept  called  "Remote  Diag- 
nosis" is  becoming  increasingly  popular  with  commercial  machines.  The  con- 
trol panel  is  connected  to  remote  test  center  via  modems  or  other  communi- 
cation facility.  Automatic  or  manual  fault  diagnosis  can  then  be  done  from 
the  remote  test  center.  In  order  to  support  such  remote  diagnosis  fea- 
tures, the  control  panel  interface  would  probably  require  an  intelligent, 
microprocessor  controlled  hardware. 

In  general,  the  control  panel  Interface  should  be  able  to  support  the 
following  features  which  are  useful  in  hardware  troubleshooting  as  well  as 

J 

software  development  and  debugging. 

1.  Load/examine  all  microsequencer  associated  registers 

2.  Provide  microstep  and  microbreak  features  for  single  stepping 
through  microinstructions 

3.  Examine/{Load  if  permissible)  control  store 

4.  Load/examine  all  base  machine  associated  registers  such  as  program 
counter,  instruction  register,  general  purpose  registers,  etc. 

5.  Load/examine  main  memory 

6.  Provide  single  cycle  or  single  instruction  execution  for  stepping 
through  machine  instructions. 


6.2.3  MCF  Bus  Controller  - There  are  two  buses  to  be  controlled. 

They  are  the  M(X)  bus  and  the  Event  bus.  The  CPU3  module  accesses  these 
buses  In  the  same  manner  as  any  other  module  via  the  event  generation  logic 
for  the  Event  bus  and  the  bus  mastership  request  logic  for  the  M( X ) bus. 

In  addition,  the  CPU3  module  also  contains  hardware  circuitry  to  monitor 
the  events  on  the  Event  bus  and  act  as  an  arbitrator  for  the  M(X)  bus 
through  the  bus  mastership  grant  logic.  These  functions  like  those  of  the 
timing  and  control  section  are  indeed  difficult  to  test. 
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A general  testing  scheme  is  to  provide  time-out  circuits  for  all  com- 
munication signals  that  use  the  handshake  protocol.  Any  time  a bus  request 
is  issued  but  not  acknowledged  within  a predetermined  time  frame,  should  be 
indicated  as  an  error  condition.  This  bus  time-out  error  could  be  as  a 
result  of  the  controller  circuitry  malfunctioning,  or  error  In  the  bus 
(backplane)  or  due  to  an  absent/malfunctioning  external  device. 

6.2.4  Read  Only  Memory  - This  256  x 36  bit  ROM  has  already  been  pro- 
vided with  four  parity  bits  (1  parity  bit/8  bit  byte)  as  per  the  ITEK 
specifications  EL-CP-2817  [23],  These  four  parity  bits  are  adequate  for 
this  type  of  ROM. 

6.3  BUS  Modules 

6.3.1  Bus  Extender  Module  (BEM) 

The  block  diagram  of  the  BEM  as  defined  in  the  ITEK  specifica- 
tions EL-CP-2824-MCF  [25]  is  given  in  Figure  6.6.  The  function  of  this 
module  is  to  provide  proper  conversion  between  the  MCF  buses  internal  and 
external  to  the  chassis.  This  is  implemented  by  a pair  of  line  drivers  and 
line  receivers  for  each  signal  on  the  MCF  bus  as  shown  in  Figure  6.7.  The 
transceiver  control  circuit  determines  and  controls  the  proper  direction  of 
the  signal  flow. 

The  following  Built-in-Test  features  are  recommended. 


A.  Parity  check  on  all  groups  of  signals,  such  as  address  and  data 
lines,  where  parity  information  has  been  generated  by  the  source 
module.  If  on-line  fault  isolation  to  the  BEM  Is  important,  then 
the  parity  check  should  be  performed  on  both  sides  of  the  trans- 
ceiver. Otherwise,  parity  check  on  just  one  side  would  be  suffi- 
cient. In  the  latter  case  it  would  would  be  more  advantageous  to 
apply  the  parity  check  after  the  line  receiver  shown  In  Figure 
6.7.  This  would  allow  parity  check  on  signals  on  the  internal 
chassis  bus  lines  that  have  passed  through  both  the  line  driver 
and  line  receiver. 
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Figure  6.6.  BEM  Block  Diagram  [25] 


Figure  6.7.  BEM  Transceiver  [25] 
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B.  Loop  around  of  signals  on  the  bus  back  to  the  source  module.  This 
would  support  an  off-line  test  where  the  source  module  (e.g.  CPU) 
could  send  test  data  to  the  BEM  and  have  the  data  returned  to  it 
after  passing  it  through  the  line  drivers  and  line  receivers. 

Then  test  hardware  operation  by  comparing  the  sent  and  received 
data.  The  loop  around  test  may  be  accomplished  by  one  of  the 
following  two  methods.  1)  By  using  a dual  set  of  bus  lines,  one 
to  send  and  second  to  return  the  test  data.  2)  By  using  a single 
set  of  bus  lines  but  providing  a means  to  latch  the  sent  data  in 
the  BEM,  so  that  it  can  be  returned  in  subsequent  cycles  over  the 
same  bus  wires.  In  either  case,  substantial  increase  in  hardware 
is  required. 


6.3.2  Bus  Interface  Module  (BIM2) 

The  block  diagrams  of  the  BIM2  as  defined  in  the  ITEK  F3  Specifi- 
cations EL-CP-2828-MCF  [264]  are  given  in  Figures  6.8  and  6.9.  The  func- 
tion of  the  BIM2  Is  to  allow  AN/UYK-41(V)  I/O  bus  (UNIBUS)  to  be  emulated 
by  an  MCF  CPU.  In  order  to  perform  this  function,  the  BIM2  must  coordinate 
the  timing  requirements  imposed  by  the  MCF  buses  (M(X)  and  Event  buses)  and 
the  timing  requirements  imposed  by  the  AN/UYK-41(V)  I/O  bus  (UNIBUS). 

As  can  be  seen  from  figures  6.8  and  6.9,  most  of  the  functions  of  BIM2 
require  synchronous  and  asynchronous  control  logic  for  which  built-in-tests 
are  indeed  difficult  to  specify.  In  general,  the  type  of  built-in-tests 
recommended  for  the  control  and  data  paths  of  the  CPU  module  are  also 
applicable  to  the  BIM2  module.  Furthermore,  the  loop  around  technique 
recommended  for  the  BEM  can  also  be  applied.  The  following  module  level 
built-in-tests  are  recommended. 


A. 


B. 


Parity  check  on  address  and  data  lines  of  both  the  MCF  M(X)  bus 
and  the  AN/UYK-41(V)  bus.  This  would  help  to  localize  any  prob- 
lems with  the  data  transfer  paths  of  the  BIM2. 

Parity  generation  and  check  on  all  internal  data,  address  and 
mapping  registers  of  the  BIM2. 
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Figure  6.9.  BIM2  Function  Block  Diagram  [26] 


C.  Provide  capability  to  access  (load  and  read)  all  internal  control 
and  status  registers  via  I/O  instructions  from  the  CPU. 

D.  Provide  timeout  circuits  on  all  communication  signals  that  use  the 
handshake  protocol  for  both  the  MCF  and  AN/UYK-4 1 ( V ) buses. 

E.  Provide  latches  on  the  MCF  bus  interface  of  BIM2  to  allow  loop 
around  of  the  data  sent  from  the  CPU  to  the  BIM2.  This  would 
check  the  MCF  bus  leading  to  the  B1M2  module  and  the  bus  receivers 
on  the  BIM2  module  under  the  control  of  the  CPU. 

6.4  Input/Output  Modules 

The  recommended  BIT  approach  for  the  Input/Output  Expansion  chassis 
will  include  a module  devoted  to  chassis  level  fault  detection,  isolation, 
and  reporting.  The  module  will  also  provide  an  interface  to  the  chassis 
maintenance  panel.  This  module  will  be  almost  identical  to  the  BIT  module 
in  the  memory  expansion  chassis  since  it  will  perform  a similar  function. 
This  BIT  module  will  be  programmable  to  enable  it  to  perform  an  off-line, 
stand-alone  test  of  all  the  modules  in  the  chassis.  In  addition,  it  will 
communicate  with  the  BIT  hardware  on  each  I/O  module  and  with  the  system  BI 
software  in  the  fault  isolation/reporting  process.  This  section  deals  with 
module  level  BIT;  therefore,  chassis  BIT  will  not  be  explained  in  detail  at 
this  time. 

Since  no  individual  I/O  modules  have  yet  been  specified,  it  is  diffi- 
cult to  recommend  specific  built-in-test  techniques.  However,  several 
generally  applicable  BIT  approaches  are  available  to  the  designer  of  the 
I/O  module.  These  concepts,  discussed  below,  should  be  Included  in  the 
design  stage  of  these  modules.  These  techniques  would  provide  module  level 
BIT  for  I/O  modules  whether  the  modules  are  located  in  an  I/O  expansion 
chassis  or  in  a main  computer  chassis. 

6.4.1  Parity  - A single  parity  bit  on  each  eight-bit  byte  of  data 
should  be  generated  (checked)  when  it  enters  (leaves)  the  MCF  bus.  In 


addition,  the  use  of  parity  within  all  I/O  modules  is  recommended  as  a 
method  to  detect  errors  in  the  data.  It  is  also  recommended  to  send 
(check)  parity  when  data  is  sent  (received)  out  of  the  MCF  I/O  expansion 
chassis.  In  this  way  there  would  be  a check  on  the  entire  data  path  from 
the  external  I/O  device  through  the  cables  and  I/O  module  and  through  the 
MCF  bus  system.  In  addition  to  data,  parity  should  also  be  sent  with 
addresses  and  operation  codes  where  these  are  sent  over  the  communication 
buses.  The  added  hardware  cost  of  these  BIT  approaches  would  be  approxi- 
mately 15%. 


6.4.2  Loop  Around  - Loop  around  involves  a controlling  device 
sending  a particular  command  to  another  device  which  causes  the  receiving 
device  to  send  an  acknowledge  back  to  the  controller.  This  technique 
provides  the  capability  to  check  the  bus  wiring,  connectors,  and  the 
recei vers ^dri vers  of  an  I/O  module.  While  this  test  by  itself  leaves  a 
major  portion  of  a module  unchecked,  the  functions  which  are  checked  are 
essential  to  the  operation  of  the  module.  This  communication  section  must 
function  properly  if  it  is  to  be  tested  more  completely  with  software. 
Successful  loop  around  testing  gives  a higher  confidence  level  to  off-line 
tests  and  aids  fault  localization,  particularly  in  bus  faults.  The  small 
amount  of  hardware  required  to  perform  this  BIT  function  would  consist  of  a 
set  of  multiplexers  (or  registers)  on  each  I/O  module  to  store  the  data  to 
retransmit  on  the  bus. 

6.4.3  Time-Out  - In  addition  to  the  time-out  checking  within  the  CPU 
there  should  be  timers  within  the  I/O  modules  which  would  detect  errors  in 
the  control  portions  of  the  modules.  Control  circuitry  is  in  general  dif- 
ficult to  test  short  of  duplication.  Time-out  signals  do,  however,  provide 
an  inexpensive  method  to  detect  major  faults  in  most  control  functions. 

6.4.4  Replication  - For  cost  reasons,  it  is  not  recommended  to  dup- 
licate each  I/O  module.  However,  some  critical  portions  of  some  modules 
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may  be  candidates  for  duplication.  By  duplicating  only  relatively  small 
sections  of  a module  it  is  possible  to  keep  BIT  hardware  down  to  a reason- 
able level,  while  providing  essential  on-line  fault  detection  capability. 

In  addition,  it  may  prove  necessary  from  a reliability  viewpoint  for  some 
applications  to  provide  TMR  (triple  modular  redundancy)  to  a few  vital  con- 
trol functions.  This  approach,  while  expensive,  does  provide  fault  tole- 
rant hardware  that  allows  the  unit  to  perform  its  required  function  until 
it  is  convenient  to  repair  it. 

b.5  Built-In-Testing  for  the  MCF  Analog  Modules 

The  two  major  analog  modules  in  the  MCF  system  are  the  power  converter 
module  (PCM)  and  the  Fan  assembly.  From  the  most  rudimentary  point  of 
view,  these  modules  can  be  tested  simply  by  measuring  those  output  param- 
eters which  interface  with  the  remainder  of  the  MCF:  output  voltage, 
ripple,  regulation,  air  flow,  etc.  Because  of  the  analog  nature  of  the  PCM 
and  Fan  modules,  however,  these  output  parameters  are  strongly  influenced 
by  environmental  parameters  such  as  operating  temperature,  line  voltage, 
and  load  current.  As  such,  if  the  false  alarm  rate  for  the  BIT  system  is 
to  be  kept  within  reasonable  bounds  one  must  also  measure  such  environmen- 
tal factors  to  determine  whether  a measured  deviation  of  the  system  output 
parameters  is  due  to  a failure  of  the  analog  module  or  the  environment  in 
which  it  operates.  Finally,  high  power  analog  devices  such  as  the  PCM  and 
Fan  are  characterized  by  thermal  and  mechanical  parameters;  transformer  and 
rectifier  temperatures,  fan  vibration,  etc.,  which  are  indicative  of  fail- 
ures not  yet  manifested  in  the  output  parameters.  An  effective  BIT  system 
should,  however,  measure  such  parameters  as  a means  for  spotting  impending 
system  failures  thereby  preventing  costly  burnouts. 

Given  the  above  observations,  rather  than  simply  monitoring  a set 
of  output  parameters,  an  effective  analog  BIT  system  must  be  able  to  com- 
pare, extrapolate,  and  evaluate  measured  data.  As  such,  some  type  of 
"intelligence"  is  required  by  the  BIT  system.  Unlike  the  digital  MCF 
modules  the  analog  modules  have  no  inherent  "intelligence"  which  can  be 
shared  with  the  BIT  system.  As  such,  the  key  to  a BIT  system  for  the 
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analog  modules  is  the  inclusion  of  some  type  of  BIT  "brain".  The  design 

of  the  "brain"  is  the  key  factor  in  determining  the  performance  charac- 

teristics; time  to  detect,  false  alarm  rate,  etc.,  of  the  BIT  system  with 
various  tradeoffs  possible  between  serial  and  parallel  processors,  etc. 
Several  possible  organizations  for  the  BIT  "brain"  are  discussed  in  the 
following  sections  along  with  the  performance  which  can  reasonably  be 
expected  from  each. 

Although  the  inclusion  of  a BIT  "brain"  in  an  analog  module  may  at 
first  seem  to  be  economically  prohibitive  this  is  not,  in  fact,  the  case. 
Indeed,  given  the  relative  cost  of  digital  to  analog  devices  an  entire 
microprocessor  system  for  the  BIT  "brain"  would  not  represent  a signifi- 
cant percent  increase  in  the  cost  of  either  the  PCM  or  Fan.  Indeed,  the 

entire  cost  of  the  BIT  system  would  be  recovered  by  the  prevention  of  a 
single  major  component  burnout;  fan  motor,  transformer,  etc.  Moreover,  the 
BIT  system  replaces  the  usual  protective  circuitry  in  either  the  PCM  or 
Fan.  As  such,  only  the  cost  difference  represents  a true  cost  increase 
attributable  to  BIT. 

In  the  following,  several  potential  BIT  "brain"  organizations  are 
discussed  and  the  performance  to  be  expected  from  a typical  design  is 
evaluated.  The  primary  parameters  in  the  PCM  and  Fan  modules  which  should 
be  monitored  are  tabulated  in  the  following  section.  This  is  followed  by  a 
description  of  several  BIT  "brain"  organizations  in  Section  6.5.2.  Sec- 
tion 6.5.3  is  devoted  to  a discussion  of  the  BIT  power  supply  and  discusses 
several  techniques  for  assurance  of  reliable  BIT  performance  in  the  face  of 
a failure  of  its  own  power  supply.  Section  6.5.4  is  devoted  to  a discus- 
sion of  BIT  self-test  procedures  while  Section  6.5.5  is  devoted  to  an 
evaluation  of  a typical  BIT  system  vis-a-vis  its  false  alarm  rate,  time- 
to-detect,  percentage  of  failures  detected,  and  reliability. 


6.5.1  What  to  Measure 

As  indicated  above  the  PCM  and  Fan  parameters  which  should  be  measured 
by  a BIT  system  may  naturally  be  catagorized  into  three  class:  output  pa- 
rameters, environmental  parameters,  and  thermal  and  mechanical  parameters. 


> 


Since  the  time  constants  associated  with  these  parameters  varies  from 
microseconds  to  seconds  they  may  be  further  classified  into  fact,  medium 
and  slow  categories  dependent  on  the  speed  with  which  a failure  must  be 
detected.  The  primary  PCM  parameters  to  be  monitored  are  tabulated  below. 

6. 5. 1.1  Output  Parameters 


1.  Output  voltages:  5.08  + 0.05  Vdc,  -12.05  + 0.05  Vdc,  and  15.10  + 
0.05  Vdc.  In  addition  to  monitoring  these  parameters  the  BIT 
system  is  required  to  initiate  an  automatic  cut-off  sequence  when 
the  output  voltages  reach  a critical  overvoltage.  For  this  pur- 
pose the  cut-in  limits  are  6.3  + .5  Vdc  for  the  5 volt  line,  -14.0 
+ .7  Vdc  for  the  -12  volt  line,  and  16.9  + .7  Vdc  for  the  15  volt 
line.  Furthermore,  the  BIT  system  must  initiate  an  interrupt  to 
initiate  a power-down  cycle  whenever  any  of  these  output  voltages 
reach  an  undervoltage  state  of  20  + 10  percent  less  than  nominal. 

2.  Air  Flow:  The  flow  rate  of  the  MCF  Fan  module  is  specified  as  a 
function  of  static  pressure,  hence,  both  of  these  parameters  must 
be  measured  and  compared  by  the  BIT  system  to  verify  proper  Fan 
performance. 

3.  Ripple:  This  parameter  must  be  less  than  100  millivolts  on  all 
output  lines  over  the  entire  operating  range  of  the  PCM.  Upon 
failure  the  BIT  system  should  send  an  interrupt  indicating  failure 
but  no  shutdown  sequence  is  required. 

4.  BIT  System  Voltage:  A deviation  from  nominal  by  the  BIT  system 
voltage  should  be  signaled  by  an  interrupt.  Here,  the  threshold 
for  the  failure  indication  should  be  set  well  within  the  tolerance 
limits  of  the  BIT  system  to  guarantee  correct  performance  of  the 
BIT  system  in  spite  of  the  failure  of  its  own  power  supply.  Pro- 
tection of  the  BIT  system  power  source  is  discussed  in  more  detail 
in  Section  6.5.3. 
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6. 5. 1.2  Environmental  Parameters: 


1.  Line  Voltage:  95-130  Vac  at  47-440  Hz.  Although  the  PCM  and  Fan 
are  required  to  operate  over  a wide  range  of  line  voltages,  the 
actual  values  of  these  parameters  must  be  monitored  within  this 
range  to  determine  if  an  observed  deviation  in  the  output  param- 
eters is,  in  fact,  due  to  a problem  in  the  line  voltage  rather 
than  the  MCF. 

2.  Load  Current:  For  the  5 volt  line  the  output  current  specifica- 
tions are  8-40A  for  the  PCM1  and  8-65A  for  the  PCM2.  For  the  -12 
volt  line  they  are  .2-6.5A  and  .2-13.0A,  respectively,  and  for  the 
15  volt  line  they  are  .2-4.0A  and  .2-7. OA,  respectively.  The  BIT 
system  should  monitor  these  parameters  both  for  the  purpose  of 
distinguishing  between  PCM  failures  and  load  failures  and  to 
initiate  protective  circuitry  in  the  event  of  overload. 

3.  Internal  Voltages  and  Currents:  It  is  believed  that  to  minimize 
false  alarms  that  certain  internal  PCM  voltages  and  currents 
should  be  monitored.  In  effect,  such  parameters  yield  a degree 
of  fault  isolation  within  the  PCM.  Although  it  is  not  our  goal 
to  isolate  failures  beyond  the  module  level,  we  believe  that  a 
certain  amount  of  fault  isolation  within  the  module  is  necessary 
to  prevent  false  alarms.  Indeed,  since  in  an  analog  system 
failure  detection  is  unambiguous,  false  alarm  prevention  amounts 
to  accurately  distinguishing  between  fail  rues  in  a module  and 
failures  in  its  environment  (i.e.  fault  isolation  up  to  the 
module) . 

6. 5. 1.3  Thermal  and  Mechanical  Parameters: 

1.  Diode  Junction  Temperature:  Since  a major  cause  of  power  supply 
failures  is  diode  burnout,  the  MCF  specifications  required  that 
diode  junction  temperatures  be  monitored  with  automatic  shutdown 
being  initiated  whenever  they  reach  critical  temperature  (146°C). 
Of  course,  the  BIT  system  may  also  employ  junction  temperature 
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measurements  in  a manner  analogous  to  that  described  above  for  the 
internal  voltages  and  currents  as  an  aid  in  false  alarm  prevention 
(fault  isolation). 

2.  Component  Temperatures:  Although  the  output  parameters  of  the  PCM 
or  Fan  module  may  still  be  in  tolerance,  if  critical  component 
temperatures  including  transformer  temperature,  fan  motor  temper- 
ature, and  chassis  temperature  are  out  of  tolerance  a failure  or 
impending  failure  in  the  moduel  is  indicated.  As  such,  these 
parameters  should  be  monitored. 

3.  Vibration:  As  with  component  temperatures  unusual  vibration  in 
either  a power  handling  PCM  component  or  the  Fan  assembly  is 
indicative  of  system  failure  and  should  be  reported  by  the  BIT 
system. 

4.  Humidity:  Since  changes  in  humidity  are  a good  indication  of  seal 
integrity  in  hermetically  sealed  components  (i.e.,  Xformer  and  fan 
motor)  which  is  easily  monitored,  it  is  recommended  that  the  PCM 
and  Fan  BIT  systems  include  a humidity  monitoring  capability  for 
such  components. 

5.  Ambient  Temperature  and  Humidity:  For  the  temperature  and 
humidity  measurements  discussed  above  to  be  meaningful  it  must 
be  compared  to  ambient  values.  As  with  the  previously  discussed 
environmental  parameters  this  comparison  is  necessary  to  prevent 
false  alarms. 

Since  the  time  constants  associated  with  changes  in  the  above  vary 
from  microseconds  to  seconds,  the  BIT  system's  handling  of  these  parameters 
must  also  vary.  For  this  purpose,  the  above  described  parameters  are  tabu- 
lated below  in  three  categories  characterizing  their  time  constants  as 
fast,  medium  or  slow.  The  fast  time  constant  parameters  are  those  whose 
failure  must  be  detected  on  a microsecond  time  scale  if  damage  to  the 
system  is  to  be  prevented.  The  medium  time  constant  category  represents 
parameters  whose  failure  should  be  detected  in  a period  of  milliseconds. 
These  are  typically  parameters  whose  failure  will  not  lead  to  Immediate 


damage  to  the  system  but  may  cause  unreliable  operation  of  the  MCF. 

Finally,  the  time  constants  underlying  most  of  the  thermal,  mechanical,  and 
humidity  related  parameters  are  sufficiently  long  that  these  parameters 
need  only  be  monitored  on  an  interval  of  a second  or  so. 

Fast  Parameters: 


1. 

Overvoltage 

2. 

BIT  System  Voltage 

3. 

Line  Voltage 

4. 

Load  Current 

5. 

Internal  Voltages  and  Currents 

Medium  Parameters: 

1. 

Undervoltage 

2. 

Air  Flow 

3. 

Ripple 

4. 

Diode  Junction  Temperatures 

Slow  Parameters: 

1.  Component  Temperatures 

2.  Vibration 

3.  Humidity 

4.  Ambient  Humidity 

6.5.2  BIT  System  Organization 

Because  of  the  time  constant  variations  among  the  parameters  which  the 
PCM  and  Fan  BIT  systems  must  monitor,  no  one  organization  for  the  BIT 
"brain"  is  immediately  obvious.  Since  the  time  constants  for  the  fast  com- 
ponents are  on  a par  with  the  cycle  speed  of  a typical  microprocessor,  a 
parallel  processing  scheme  is  needed  for  handling  these  parameters.  On  the 
other  hand,  the  time  constants  for  the  slow  parameters  are  quite  compatible 
with  a sequential  processor.  These  considerations,  however,  must  be 
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balanced  against  the  computation  and  comparison  of  data  required  to  prevent 
false  alarms.  As  such,  we  have  investigated  three  potential  BIT  system 
organizations  in  an  effort  to  determine  the  optimal  performance  which  can 
be  expected  from  the  PCM  and  Fan  BIT  systems.  These  include  a parallel 
processor  with  minimal  computation  capability,  a microprocessor  based 
serial  processor  and  a hybrid  of  the  two.  These  are  described  below  to- 
gether with  a comparison  of  their  capabilities. 


6. 5. 2.1  Latches  Plus  Logic  - The  simplest  and  fastest  BIT  organiza- 
tion investigated  is  illustrated  schematically  in  Figure  6.10.  This  par- 
allel processor  which  we  term  "latches  plus  logic":  (L  + L)  is  based  on  an 
array  of  latches  which  sense  the  various  analog  system  parameters.  When- 
ever a parameter  crosses  a prespecified  threshold,  a latch  is  triggered  and 
sends  a binary  input  to  a hard  wired  logic  array  which  makes  a Boolean  de- 
cision as  to  whether  or  not  the  deviation  from  nominal  by  this  parameter 
represents  a module  failure.  If  so,  the  appropriate  interrupt  or  automatic 
shutdown  sequence  is  initiated. 

The  main  advantage  of  the  L + L organization  is  the  speed  with  which 
it  can  detect  the  failure  of  a fast  time  constant  parameter.  On  the  other 
hand  only  Boolean  information  is  available  to  the  logic  array  which  may,  in 
turn,  make  only  Boolean  decisions.  As  such,  little  capability  for  careful 
data  analysis  and  false  alarm  prevention  exists. 

6. 5. 2. 2 Analog  to  Digital  Converter  plus  Microprocessor  - A second 
BIT  organization  made  up  from  an  A/D  converter  and  a microprocessor 
(A/D+uP)  is  illustrated  in  Figure  6.11.  Here,  the  analog  data  is  fed  into 
a multichannel  A/D  converter  with  the  various  channels  being  called  sequen- 
tially by  a microprocessor.  The  uP  then  analyzes  the  data  from  each  chan- 
nel and  compares  data  from  various  combinations  of  channels  to  detect  fail- 
ures and  to  determine  whether  or  not  there  cause  is  within  the  given  analog 
MCF  module. 

The  advantage  of  the  A/D+uP  Organization  is  its  powerful  computational 
capability  which  allows  it  to  compare  data  from  several  channels  and/or  to 
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compare  data  with  stored  information  to  determine  whether  or  not  a detected 
failure  is  internal  or  external  to  a given  module.  As  such,  the  false 
alarm  rate  can  be  minimized  with  the  A/D+pP  organization.  Unfortunately, 
given  the  speed  of  a typical  microprocessor  and  the  sequential  nature  of 
the  A/D+uP  BIT  organization,  the  time  interval  between  the  various  samples 
of  a given  channel  may  be  as  large  as  a millisecond.  Since  this  exceeds 
the  time  constant  of  the  fast  parameters,  this  BIT  organization  may  not  be 
acceptable  in  practice.  Of  course,  several  modifications  can  be  made  to 
improve  the  performance  of  the  A/D+yP  scheme.  In  particular,  one  might  use 
a periodic  sampling  in  which  the  fast  parameters  are  sampled  more  often 
than  the  medium  and  slow  parameters. 

6. 5. 2. 3 Analog  to  Digital  Converter  plus  Microprocessor  with 

Interrupt  - One  feasible  approach  for  improving  the  perform- 
ance of  the  A/D+yP  scheme  described  above  is  to  hybrid  it  with  the  L + L 
scheme  as  illustrated  in  Figure  6.12. 

The  resultant  scheme  which  we  term  A/D+yP  with  interrupts,  in  its  nor- 
mal mode  operates  just  like  the  A/D+uP  scheme  described  above.  It,  how- 
ever, also  has  an  array  of  latches  which  monitor  the  analog  signals  from 
the  various  sensors  and  trigger  an  interrupt  to  the  uP  whenever  any  analog 
signal  reaches  a critical  threshold.  As  such,  whenever  a fast  parameter 
crosses  its  threshold,  the  uP  is  called  from  whatever  it  is  doing  to  inves- 
tigate the  fast  parameter  before  any  catastrophic  failure  takes  place.  The 
scheme,  therefore,  has  the  computational  power  and  false  alarm  prevention 
capability  of  the  A/D+yP  BIT  organization  with  a reaction  time  comparable 
to  the  L + L organization. 

A summary  of  the  three  proposed  BIT  organizations  and  their  capa- 
bilities vis-a-vis  reaction  time  and  false  alarm  prevention  is  given  in 
Table  6.2  below. 


Figure  6.12  Block  Diagram  Of  A/D  + 11P  With  Interrupt  BIT  Organization 
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TABLE  6.2.  PERFORMANCE  OF  THE  THREE  PROPOSED  BIT  ORGANIZATIONS 


L+L A/D+gP A/D+uP  with  Int. 

Reaction  Time  very  fast  ( 1 us ) medium  (1ms)  fast  ( 50 us ) 

False  Alarm 

Prevention minimal good good 

6.5.3  The  BIT  System  Power  Supply 

In  designing  the  BIT  system  for  the  power  converter  module,  one  must 
take  special  care  to  design  a system  which  will  perform  satisfactorily  in 
the  face  of  a failure  of  its  own  power  source.  The  key  to  achieving  this 
goal  lies  in  the  use  of  a digital  signal  processor  for  the  analysis  of  ana- 
log data.  Such  a processor  will  operate  effectively  as  long  as  its  power 
supply  is  operating  within  some  specified  tolerance  limits.  As  such,  if 
the  threshold  of  failure  for  the  BIT  system  power  supply  is  taken  to  be 
well  within  these  tolerance  limits  the  BIT  system  will  reliably  report  the 
failure  of  its  own  power  supply. 

Secondly,  to  cope  with  catastrophic  failures  in  the  PCM,  the  BIT  power 
supply  must  be  isolated  from  the  PCM  and  have  a hold  time  exceeding  the 
time  to  detect  a failure  in  the  BIT  power  supply.  This  can  be  achieved 
with  the  simple  diode  circuit  illustrated  in  Figure  6.13  or  the  equivalent. 
Here,  the  diode  effectively  disconnects  the  BIT  power  supply  from  the  PCM 
when  the  15v  PCM  line  fails  while  the  RC  time  constant  of  the  BIT  power 
supply  is  chosen  to  obtain  the  desired  hold  time.  Since  the  current  re- 
quirements of  the  BIT  system  will  be  relatively  small  this  can  be  achieved 
using  reasonable  RC  values. 

Finally,  the  display  used  to  indicate  PCM  failures  must  be  non- 
volatile so  that  the  failure  indication  remains  after  shutdown  of  the 
system. 


6.5.4  Testing  the  Tester 

As  with  any  test  system,  one  must  face  the  question  of  testing  the 
tester.  For  the  BIT  "brain"  one  can  use  standard  digital  system  test  algo- 
rithms. When  a microprocessor  is  employed,  a self  test  algorithm  can  be 
used,  either  in  slack  time  or  periodically  programmed  into  the  test  algo- 
rithm. Alternatively,  the  MCF  itself  or  one  of  the  other  distributed 


110 


processors  in  the  BIT  system  can  be  used  to  test  the  BIT  "brain"  in  the 
analog  modules. 

Secondly,  the  BIT  brain  should  be  programmed  to  test  its  own  sensors. 
While  an  outright  determination  of  sensor  accuracy  can  only  be  achieved 
with  some  type  of  redundancy,  the  BIT  "brain"  can  easily  spot  open  and 
short  circuited  sensors  which  represent  the  majority  of  sensor  failures. 

6.5.5  Evaluation  of  Analog  BIT  System  Performance 


For  an  analog  system,  the  usual  measures  of  test  system  performance 
such  as  percent  of  failures  detected  and  false  alarm  rate  are  somewhat  in- 
appropriate. Indeed,  except  for  transients,  all  failures  should  be  detec- 
ted and  all  detected  failures  represent  an  actual  (or  transient)  failure. 

On  the  other  hand,  false  alarms  will  result  from  the  loading  effects 
associated  with  analog  systems  since  such  effects  may  cause  a failure  in 
one  module  to  be  manifested  in  another  or  a change  in  the  system  environ- 
ment to  be  reported  as  a module  failure.  In  essence,  the  false  alarm  rate 
for  an  analog  BIT  system  is  a measure  of  the  ability  of  the  system  to  iso- 
late a detected  failure  to  the  correct  module  (or  the  environment)  rather 
than  a measure  of  the  probability  that  a detected  failure  is  “real". 

In  spite  of  the  above  comments  in  the  following  evaluation,  we  have 
used  the  standard  digital  system  BIT  performance  criteria,  appropriately 
reinterpreted,  to  make  our  evaluation  of  the  PCM  and  Fan  module  BIT  systems 
compatible  with  the  evaluations  of  the  digital  MCF  modules.  The  evaluation 
assumes  khe  A/D+uP  BIT  organization  with  interrupts  implemented  with  a 
Mostek  F8  microprocessor.  Although  this  is  certainly  not  the  ultimate 
approach  or  the  only  implementation,  thereof,  we  feel  that  its  performance 
is  a reasonable  benchmark  by  which  the  performance  of  other  BIT  system 
organizations  and/or  implementations  may  be  measured. 

Our  prototype  BIT  system  for  the  PCM  and  Fan  modules  of  the  MCF  was  a 
recently  introduced  variation  on  the  F8  (Model  3870)  which  contains  2K  of 
ROM  and  64  Bytes  of  RAM  on  the  CPU  chip,  thereby,  yielding  a single  chip  yP 
system.  In  addition  to  the  uP,  a monolithic  A/D  converter  (Teledyne  8703) 
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and  an  octal  latch  (Fairchild  74S373),  together  with  discrete  signal  and 
power  conditioning  components  and  sensors,  make  up  the  BIT  system.  The 
performance  parameters  as  they  apply  to  the  PCM  module  are  discussed  in  the 
next  paragraph. 

The  percent  of  failures  detected  will  be  100%  for  the  PCM  module  with 
BIT.  As  long  as  all  of  the  output  parameters  through  which  the  analog 
modules  interface  with  the  MCF  are  monitored,  all  failures  will  be  detec- 
ted. All  detected  failures  will  represent  real  or  transient  failures  in 
the  MCF  or  its  environment.  As  indicated  above,  the  false  alarm  rate  in 
the  analog  BIT  systems  is  really  a measure  of  the  ability  to  isolate  a de- 
tected failure  to  the  correct  module  or  the  MCF  environment.  With  a full 
microprocessor  in  the  BIT  "brain"  and  given  time  to  run  cross-checks  on  a 
detected  failure,  the  false  alarm  rate  can  reasonably  be  kept  under  5 per- 
cent. Since  all  computation  required  for  the  BIT  system  can  be  done  by 
comparison  of  data  taken  from  one  channel  with  data  taken  from  another  and/ 
or  stored  parameters,  the  analysis  of  the  data  taken  from  any  one  channel 
can  be  conservatively  carried  out  in  25  cycles  of  the  uP.  Given  the  2uS 
cycle  time  of  the  F8,  with  the  system  programmed  to  check  for  interrupts 
after  the  analysis  of  data  from  each  channel,  this  will  result  in  a time  to 
detect  of  50us.  Here,  we  note  that  one  can  tradeoff  BIT  reaction  time  and 
false  alarm  rate  since  making  more  cross-checks  to  reduce  the  false  alarm 
rate  will  increase  reaction  time  and  vice-versa.  Of  course,  one  does  not 
require  fast  reaction  time  for  the  various  medium  and  slow  time  constant 
parameters  and  can  therefore  make  more  cross-checks  on  these  parameters 
than  the  fast  time  constant  parameters. 

The  cost  parameters  that  measure  hardware  cost  (power,  space,  and 
failure  rate)  are  discussed  in  Section  8.0.  This  module  level  BIT  has  no 
effect  on  the  application  programs,  but  the  small  impact  on  diagnostic  and 
operating  system  software  is  also  discussed  in  Section  8.0.  Mere  detailed 
information  on  the  cost  parameters  for  all  the  modules  is  given  in  Appen- 
dix C. 
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7.0  ELAPSED  TIME  (MEASUREMENT 

The  purpose  of  specifying  an  elapsed  time  recorder  and  indicator  on 
each  module  is  to  be  able  to  enforce  the  MCF  warranty  concept  on  a module 
basis.  With  such  an  indicator,  it  is  possible  not  only  to  tell  if  a module 
has  been  in  use  past  its  warrantied  period,  but  to  establish  a failure  rate 
distribution  as  the  modules  are  sent  back  for  repair.  This  type  of  infor- 
mation is  useful  in  refining  the  failure  rate  model  for  the  modules  and 
identifying  batches  of  unreliable  modules.  For  these  reasons,  it  is  de- 
sired that  the  elapsed  time  indicator  show  more  than  whether  or  not  the 
warranty  period  is  passed.  It  should  show  with  some  degree  of  accuracy  (at 
least  3%)  the  actual  time  the  module  has  been  powered  on.  Two  distinct 
approaches  have  been  considered  as  candidates  for  recording  this  time  in- 
terval. One  is  analog  and  one  is  digital  with  each  approach  having  its  own 
set  of  advantages  and  disadvantages. 

The  candidate  analog  version  is  a small  transparent  tube  filled  with  a 
mercury  compound  that  plates  out  onto  the  tube  walls  when  a voltage  is  ap- 
plied across  the  ends  of  the  tube.  This  tube  would  be  connected  to  the 

module's  five-volt  supply.  The  amount  that  has  plated  onto  the  wall  of  the 
tube  will  be  proportional  to  the  time  period  that  power  has  been  applied  to 
the  module.  Commercial  versions  have  been  used  successfully  for  a number  of 
years  and  their  cost  is  low  (about  five  dollars).  This  simple,  inexpensive 
device  has  a non-volatile  display  enabling  the  elapsed  time  to  be  read  by  a 
serviceman  or  operator  whether  the  module  is  in  or  out  of  the  chassis  and 
with  or  without  power  applied. 

However,  the  elapsed  time  cannot  be  read  at  any  time  by  the  computer, 
and  to  be  read  by  anydne,  a cover  would  probably  have  to  be  removed  from  a 
computer  chassis.  Another  problem  may  arise  from  use  in  environments  that 
are  subject  to  a high  level  of  shock.  That  is  some  of  the  mercury  may 

become  dislodged  from  the  tube,  thus  giving  an  erroneous  reading. 

The  alternative  is  a digital  technique  utilizing  a counter  and  a non- 
volatile storage.  Since  timing  components  such  as  crystals  and  capacitors 
have  a high  failure  rate  as  compared  with  digital  logic,  it  is  not  desir- 
able to  have  a self-contained  timer  on  each  module.  A more  reliable  ap- 
proach would  be  to  have  a clock  in  each  chassis  that  would  send  a timing 
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signal  to  all  modules  within  the  chassis.  This  clock  should  have  duplica- 
ted timing  components  so  that  the  timing  signal's  reliability  would  be 
high.  On  each  module  there  would  be  a non-volatile  storage  device  such  as 
EAROM  (electronically  alterable  read  only  memory)  or  a ROM  with  fusible 
links.  Some  small  amount  of  control  circuitry  would  also  be  needed  to 
properly  code  the  timing  signals  into  the  storage  device.  To  reduce  the 
amount  of  control  circuitry  that  would  need  to  be  added  to  each  module,  the 
central  clock  generation  circuitry  should  also  contain  the  frequency  di- 
viders. Only  a pulse  per  hour  or  similar  slow  timing  information  need  be 
given  to  each  module.  Thus,  the  elapsed  time  counter  on  each  module  would 
record  every  pulse  that  it  receives.  A block  diagram  of  such  a design  is 
shown  in  Figure  7.1. 

This  digital  technique  would  have  an  advantage  in  that  the  computer 
software,  and  thus  the  computer  operator,  would  be  able  to  examine  the 
elapsed  time  of  each  module  in  the  system.  This  information  could  lead  a 
serviceman  to  first  run  diagnostics  on  modules  that  were  past  their  war- 
ranty period  and  perhaps  significantly  shorten  the  time  required  to  isolate 
to  a single  faulty  module.  However,  this  digital  technique  has  a limita- 
tion because  once  a module  has  been  removed  from  a chassis,  a serviceman 
has  no  way  of  telling  the  elapsed  time  of  the  module.  The  module  must  be 
plugged  into  some  sort  of  tester  that  has  a readout  capability  if  a ser- 
viceman wished  to  determine  if  a module  has  passed  its  warranty  period. 

The  digital  technique  would  no  doubt  provide  greater  accuracy  than  the 
analog  one.  The  accuracy  of  the  analog  approach  is  limited  by  the  length  of 
the  tube  and  the  precision  with  which  the  mercury's  length  can  be  plated  and 
measured.  An  accuracy  of  one  percent  (100  hours  in  10,000  hours)  would  be 
a typical  example  of  the  best  commercial  ones  in  use  today.  The  digital 
timer's  precision  could  easily  show  10,000  or  15,000  hours  down  to  the  hour 
The  accuracy  is  limited  by  the  drift  of  the  central  chassis  clock,  but  this 
would  probably  be  a crystal  with  a typical  accuracy  of  .01%.  There  also 
would  be  a small  quantization  error  associated  with  the  digital  approach. 

The  digital  timers  would  take  up  more  board  space  than  the  analog  timer  and 
use  one  module  connector  pin.  The  analog  technique  would  use  no  connector 
pins.  The  digital  timer  requiring  several  integrated  circuits  would  cost 
more. 
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In  summary,  the  analog  timer  is  small,  inexpensive,  and  able  to  be  read 
by  visual  inspection.  The  digital  approach  is  very  accurate  and  able  to  be 
read  by  the  computer.  A comparison  of  these  two  alternatives  is  shown  in 
Table  7.1.  A more  detailed  analysis  of  the  required  components  and  their 
possible  environmental  limitations  needs  to  be  done  before  a final  specifi- 
cation is  made.  Perhaps  by  using  both  techniques,  the  advantages  of  each 
can  be  effectively  used. 


TABLE  7.1  COMPARISON  OF  ELAPSED  TIME  INDICATOR  IMPLEMENTATIONS 


Analog 

Digital 

- low  cost 

- moderate  cost 

- visually  readable 

- no  visual  readability 

- no  computer  interface 

- computer  interface 

- possible  environmental 

problems 

- rugged 

- small  size 

- moderate  si^j 

- moderate  accuracy 

- very  accurate 

8.13  PERFORMANCE/COST  EVALUATION  OF  THE  RECOMMENDED  BIT 


This  section  deals  with  the  quantitative  assessment  of  the  effective- 
ness of  the  built-in-tests  recommended  in  the  previous  sections  for  the  MCF 
AN/UYK-41(V)  computer  systems.  Performance  and  cost  of  the  recommended  BIT 
have  been  evaluated  in  terms  of  the  BIT  effectiveness  criteria  discussed  in 
Section  2.3.5. 

It  should  be  recalled  that  five  BIT  performance  parameters  and  six  BIT 
cost  parameters  were  identified  in  Section  2.3.5.  The  performance  parame- 
ters identi f ied  were:  probability  of  detection  (P$fd),  probability  of 
localization  (Plfe)>  probability  of  false  alarm  ( PfA ^ » time  to  de- 
tect ( T sFO ) » and  time  to  localize  (Tlfe).  The  BIT  costs  were  cate- 
gorized into  hardware  and  software  costs.  The  cost  parameters  identified 
were:  space  (A),  power  (P),  and  failure  rate  (FR)  for  the  hardware,  and 
operating  system  (OS),  application  software  (AS),  and  diagnostic  software 
(DS)  for  the  software. 

In  order  to  meaningfully  quantify  these  parameters  it  is  necessary  to 
identify  a baseline  system  configuration  whose  characteristics  are  explic- 
itly known.  Then  the  impact  of  the  BIT  at  the  module,  chassis,  and  system 
level  can  be  ascertained  relative  to  this  baseline  system  configuration. 

It  should  be  emphasized  that  the  MCF  computer  systems  have  not  yet 
been  designed  and  therefore  functional  logic  diagrams  are  not  yet  avail- 
able. The  evaluation  of  the  BIT  performance  and  cost  is  based  on  the  best 
available  information  at  this  time.  The  F^  specifications  permit  a 
reasonably  good  engineering  estimate  of  the  BIT  performance  and  cost  at  the 
module  and  chassis  level.  However,  due  to  lack  of  information  on  the  MCF 
software,  the  system  level  BIT  performance  and  costs  are  difficult  to  esti- 
mate. 

In  the  following  sections,  a baseline  system  is  defined  and  then  BIT 
approaches  at  the  module,  chassis,  and  system  level  are  evaluated  with  re- 
spect to  it.  In  the  process  of  evaluating  the  BIT  effectiveness  a number 
of  assumptions  were  made  in  order  to  quantify  the  performance  and  cost 
parameters  listed  above.  These  assumptions  are  explained  in  the  sections 
where  they  are  applicable. 
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H • i Baseline  System  Definition 

For  purposes  of  the  BIT  effectiveness  evaluation,  the  AN/UYK-41(V) 
single  processor  system  has  been  chosen  as  the  baseline  system.  It  is 
assumed  that  this  system  consists  of  a Main  Computer  Chassis  No.  1,  one 
Memory  Expansion  Chassis,  and  one  I/O  Expansion  Chassis.  Furthermore,  it 
is  assumed  that  each  chassis  is  populated  with  its  full  complement  of 
modules.  This  forms  a system  with  256K  words  of  memory  and  14  MCF  I/O 
channels  which  is  assumed  to  be  a typic  I single  processor  configuration 
LB]. 

A summary  of  the  essential  MCF  module  specifications  are  given  in 
Table  8.1.  They  have  been  obtained  from  the  F3  specifications  where 
available.  Other  parameters  were  estimated  as  explained  in  the  notes.  Of 
particular  importance  are  the  power  dissipation  and  failure  rate  specifica- 
tions which  will  be  used  for  the  purposes  of  the  cost  estimation  of  the 
baseline  system. 

The  space  requirements  of  digital  circuits  can  be  more  easily  compared 
in  terms  of  the  number  of  integrated  circuit  (IC)  packages  (or  chips)  than 
the  physical  dimensions  of  the  modules.  For  this  reason,  the  number  of  IC 
chips  per  MCF  module  was  estimated  using  the  physical  dimensions  of  the 
modules  as  a basis.  The  Table  8.2  shows  these  estimates.  The  primary 
assumptions  here  are  that  each  module  consists  of  one  or  more  printed  cir- 
cuit board(s).  A more  subtle  assumption  is  that  a major  portion  of  the 
logic  design  is  done  with  LSI  circuits  to  increase  the  function  density  of 
each  module.  Other  assumptions  in  deriving  the  chip  count  for  modules  are 
given  in  the  notes  on  Table  8.2. 

Using  the  information  in  Table  8.1  and  8.2,  the  space,  power,  and 
failure  rate  characteristics  of  the  single  processor  are  derived  as  shown 
in  Table  8.3  and  8.4.  Table  8.3  shows  the  cumulative  totals  for  each 
chassis,  while  Table  8.4  shows  the  cumulative  totals  by  the  module  type. 

The  numbers  given  in  these  two  tables  are  used  in  succeeding  sections  to 
determine  the  relative  space,  power,  and  failure  rate  increase  due  to 
additional  BIT  circuitry  at  the  module,  chassis,  and  system  levels. 
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TABLE  8.1.  SUMMARY  OF  RELEVANT  MCF  MODULE  SPECIFICATIONS 


Modu 1 e 

Name 

Maximum 

Power  Dissipation 

MTBF 

hours 

Failure  Rate 
per  10®  hours 
[1] 

NRAM 

(32Kxl8) 

in 

3.5  lbs. 

70  W 

25  W (Standby) 

15,000 

66.7 

VRAM 

(32Kxl8) 

Ill 

3.5  lbs. 

50  W 

40  W (Standby) 

10,000 

100 

CPU3 

IV 

9.0  lbs. 

80  W 

6,000 

166.7 

MCM3 

II 

2.5  lbs. 

30  W 

40,000 

25 

BEM  . * 

11 

3.0  lbs. 

25  W 

40,000 

25 

BIM2 

II 

3.0  lbs. 

30  W 

30,000 

33.3 

PCM2 

V 

12.0  lbs. 

variable  [2] 

7,500 

133.3 

I0M0D 

I 

2.0  lbs. 

20  W [3] 

50,000  [4] 

20 

Notes 

[1  ] Fai lure  Rate  \ = 10® 

' MI BF 

[2]  Power  Dissipation  of  PCM2  based  on  70*  efficiency 

[3]  Estimated  from  I/O  Expansion  Chassis  F^  specifications 

[4]  Estimated  from  module  type  and  maximum  power  dissipation. 
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TABLE  8.2.  ESTIMATION  OF  TOTAL  NUMBER  OF  IC  CHIPS  PER  MCF  MODULE 


Module 

Name 

Module 

Type 

Dimensions 

Depth,  Width,  Height 
Inches 

Estimated 
Number 
of  Boards 
per  Module  ; 
[1] 

1 

Estima ted 
Usable 
Area 

sq.  inches 
[2] 

Estimated 
Number 
of  Chips 

[3] 

YRAM32 

III 

1.4x9. 0x6.0 

3 

140.25 

140 

CPU3 

IV 

3.5x9.0x7.17 

7 

327.25 

245 

MCM3 

II 

.95x9.0x6.48 

2 

93.5 

70 

BEM 

II 

.95x9.0x6.48 

2 

93.5 

70 

BIM2 

II 

.95x9.0x6.48 

2 

93.5 

70 

IOMOD 

I 

.45x9.0x6.48 

1 

46.75 

35 

PCM2 

V 

4.9x9.0x7.17 

1 

46.75 

35  [4] 

Notes: 

‘ 

[1]  Assume  .5  inch  spacing  between  boards  in  module. 

[2]  Assume  8.5x.5  inches  usable  area  per  board. 

[3]  Assume  .75  IC  Chips/sq.  inch  density  for  Logic  modules. 

1.0  1C  Chips/sq.  inch  density  for  Memory  modules. 

[4]  Although  PCM2  module  may  not  have  digital  ICs,  an  equivalent  number  of 
ICs  has  been  determined  for  the  purpose  of  estimating  space  cost  for  the 
PCM  BIT. 

I * 

* 

I 

I J 


. 
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TABLE  8.3. 


SINGLE  PROCESSOR  SYSTEM  CHARACTERISTICS 
WITHOUT  BUILT- IN-TEST  BY  CHASSIS 
(DERIVED  FROM  F3  SPECIFICATIONS) 


Chassis 

Modules 

Qty 

Maximum 

Power 

Dissipation 

Failure 

Rate 

/TO6  hrs. 

Number 
of  Chips 

Main  Computer 

VRAM32 

2 

100  W 

200 

m 

Chassis  No.  1 

CPU3 

1 

80  W 

166.7 

mm 

BEM 

50  W 

50 

■ti 

BIM2 

30  W 

33.3 

IOMOD 

1 

80  W 

80 

IrP 

PCM2 

1 

1*5  W [1] 

133.3 

Kn 

Subtotal 

11 

458  W 

663.3 

910 

Percent 

36.9% 

36.3% 

ms 

Memory 

BEM 

n 

25  W 

25 

70 

Expansion 

VRAM32 

300  W 

600 

840 

Chassbs 

PCM2 

1 

139  W[l] 

133.3 

35  [2] 

Subtotal 

8 

464  W 

758.3 

945 

Percent 

35.4% 

41.8% 

39.7% 

I/O  Expansion 

BEM 

1 

25 

70 

Chassis 

BIM2 

1 

33.3 

70 

IOMOD 

10 

200  W 

200 

350 

PCM2 

1 

109  W[l] 

133.3 

35(2) 

Subtotal 

13 

364  W 

391.6 

525 

27.7% 

21.6% 

moi 

Total 

32 

1313  W 

1813.2 

2380 

Percent 

100% 

100% 

100% 

100% 

Notes: 


[1]  Power  dissipation  of  PCM2  based  on  70"  efficiency 

[2]  Equivalent  number  of  chips. 
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[2]  Equivalent  number  of  chips. 


8.2  Evaluation  of  Module  Level  BIT 


The  Module  Level  BIT  as  described  earlier  consists  of  fixed  hardware 
for  on-line  fault  detection  and  programmable  hardware  for  idle  time  or 
off-line  fault  detection.  The  BIT  hardware  would  be  resident  on  the 
modules  and  would  test  the  functional  circuit  on  that  module.  At  the 
module  level  there  is  no  associated  BIT  software.  Whatever  software  is 
required  to  use  the  programmable  BIT  hardware  for  fault  detection  is 
assumed  to  be  at  the  chassis  or  the  system  level. 

The  cost  of  the  BIT  hardware  at  module  level  is  estimated  as  follows. 
For  each  module,  the  recommended  built-in-tests  described  in  Section  6.1) 
were  analyzed  in  detail.  Recall  that  the  built-in-tests  were  recommended 
based  on  the  type  of  subfunctions  within  each  module.  Thus,  for  each  BIT 
the  number  and  type  of  IC  chips  required  to  test  the  subfunction  partially 
or  completely  were  determined.  The  power  dissipation  per  chip  was  obtained 
from  IC  manufacturers  and  the  failure  rate  data  was  obtained  using  MIL- 
HDBK-217.  The  total  number  of  chips,  maximum  power  dissipation  and  the 
total  failure  rate  were  then  computed  for  each  bui 1 t-i n-test.  This 
detailed  analysis  is  given  in  Appendix  C.  The  hardware  cost  estimates  are 
summarized  in  Table  8.5. 

There  are  several  problems  associated  with  the  cost  estimates  given  in 
Table  8.5.  1)  Because  of  the  lack  of  detailed  logic  diagrams,  the  cost  of 

not  all  of  the  recommended  built-in-tests  could  be  estimated.  Where  the 
cost  could  not  be  estimated,  it  was  assumed  to  be  zero  which  tends  to  lower 
the  overdl  BIT  cost  figures.  2)  On  the  other  hand,  for  the  built-in- 
tests for  which  the  estimates  could  be  made  based  on  the  functional  speci- 
fications, the  BIT  hardware  was  estimated  in  terms  of  SSI  and  MSI  chips. 
This  is  because  the  type  of  logic  functions  required  for  hardware  testing 
are  not  currently  available  as  LSI  chips.  This  tends  to  increase  the  over- 
all cost  figures  because  it  was  assumed  earlier  that  the  functional  cir- 
cuits on  the  MCF  modules  are  designed  with  LSI  chips.  These  two  factors 
partially  compensate  the  errors  in  estimation  of  cost. 

The  Table  8.6  summarizes  the  performance/cost  figures  for  the  Module 
Level  BIT.  The  hardware  cost  figures  are  taken  directly  from  Table  8.5. 

The  software  costs  are  not  applicable  at  the  module  level.  In  estimating 


TABLE  8.5.  HARDWARE  COST  ESTIMATION  FOR  MODULE  LEVEL  BIT 
ON  MCF  MODULES 


Number 

Maximum 

Failure 

Type  of 

of 

IC 

Power 

Rate  ’ 

Module 

BIT 

Chips 

Dissipation 

/10°  hrs. 

VRAM32 

1) 

Data 

a) 

Parity,  or 

8 

4.0  W 

10.8 

b) 

ECC 

32 

14.5  W 

34.0 

2) 

R/W  Control 

a) 

Dupl ication 

— 

-- 

— 

Total 

E 

32 

KOI 

14.5 

10.8 

34.0 

Percent  Increase  [1] 

5.7X 

22. 9% 

8% 

CM 

ee m 

ftl-WLlm! 

CPU3 

• 

1) 

Clock' 

a) 

Maintenance  clock 

b) 

Stepper  clock 

— 

— 

-- 

2) 

u Sequencer 

a) 

Parity-uRegisters 

4 

1.6  W 

0.97 

b 

Support-usttp/break 

-- 

- 

c) 

Illegal  opcode 

2 

0.1 

W 

0.13 

3) 

Control  Store 

a) 

Parity 

11 

3.3  W 

2.78 

b) 

^diagnostic  extension 

5 

2.5  W 

2.81 

4) 

Timing  & Control 

a) 

Dupl ication 

-- 

-- 

— 

5) 

Int.  Data  Paths 

a) 

Parity-Int.  Registers 

15 

5.9  W 

3 

.98 

6) 

ALU 

a) 

Residue  code 

— 

— 

-- 

7) 

Control/ Status 
Registers 

a) 

Dupl ication 

— 

— 

-- 

8) 

CP  Interface 

a) 

MED 

— 

-- 

-- 

9) 

MCF  Bus 

a) 

Parity-data/address 

5 

1.4 

w 

11 

Controller 

b) 

Timeouts 

3 

1.5  W 

i 

.12 

10) 

ROM  (256x36) 

a) 

Parity 

6 

2.5 

w 

1 

.26 

Total 

51 

18.8  W 

1 1 

Percent  increase  [1] 

20. 8X 

23. 

5% 

21 

.2T 

--  Indicates  BIT  technique  not  used  In  cost/pcrformance  estimation. 
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TABLE  8.5.  CONTINUED 


Number 

Maximum 

Failure 

Type  of 

of  IC 

Power 

Rate 

Modu 1 e 

BIT 

Chips 

Dissipation 

/106  hrs. 

BEM 

a)  Parity-data/address 

5 

sin 

1.11 

b)  Loop  around 

11 

ml 

3.52 

Total 

16 

7.2  W 

4.63 

Percent  Increase  [1] 

22.9% 

28.82 

18.5% 

BIM2 

a)  Parity-data/address 

5 

2.0  W 

1.11 

b)  Parity-Int.  registers 

-- 

-- 

c)  Timeout 

2 

1.0  w 

d)  Loop  around 

6 

2.7  W 

ESS 

Total 

13 

5.7  W 

3.70 

& ■ t nj 

18.62 

192 

11.12 

IOMOD  ■ * 

a)  Parity-data/address 

5 

2.0  U 

1.11 

b)-  Timeout 

2 

1.0  W 

0.75 

c)  Loop  around 

-- 

-- 

-- 

Total 

7 

3.0  W 

1.86 

Percent  Increase  [1] 

202 

15% 

9.3% 

PCM2 

a)  A/D  + uP 

n 

1.5  W 

1.35 

Percent  Increase  [1] 

mm 

1.12  [3] 

10.1% 

[1]  Percent  Increases  In  space  (A),  power  (P)  and  failure  rate  (FR)  are  calculated  as 

follows: 

A*  f of  chips  for  BIT  circuit /module 
total  # of  chips/module 

P"  Max,  power  dissipation  for  BIT  circuit/module 
Max.  power  dissipation/module 

FR*  Failure  rate  for  BIT  circuit/module 
Total  Failure  rate/module 

[2]  Failure  rate  for  the  memory  module  decreases  due  to  the  use  of  error  correcting 
code.  See  details  in  Appendix  C. 

[3]  Average  of  the  maximum  power  dissipation  for  the  PCM  is  131  watts. 
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TABLE  8.6.  SUMMARY  OF  PERFCRMANCE/COST  FIGURES 
FOR  MODULE  LEVEL  CIT 


Compos  1 te 

tU 


65. 4-: 


1-2  usee 


O'"- 


ioor; 


1-2  usee 


21.  St- 


IB. 9! 


3.3 


Notes : 

[1]  Composite  figures  are  calculated  for  a single  processor  computer  system  given 

in  Table  8.4  as  follows: 

Composite  P$FD=  E (pSFD  module  * Percent  distribution  of  failure 

rate  for  that  type  of  module) 

Composite  A = I (A  of  the  module  x percent  distribution  of  number  of 
chips  for  that  type  of  module) 

Composite  P * I (P  of  the  module  x percent  distribution  of  the  power 
dissipation  for  that  type  of  module) 

Composite  FR  * z (FR  of  the  module  x percent  distribution  of  failure 
rate  for  that  type  of  module) 


> • 


TABLE  G.6.  CONTINUED 


Notes: 

[2]  Most  module  faults  will  be  detected  within  one  CPU  instruction  cycle  time. 

[3]  It  is  assumed  that  false  alarms  can  occur  due  to  faulty  BIT  circuit.  If 
the  BIT  circuit  is  faulty,  it  means  the  module  Is  faulty.  Therefore,  the 
possibility  of  false  alarms  at  module  level  is  zero. 

[4]  Module  Level  BIT  detects  faults  only  within  a module,  i.e.,  detected  faults 
are  always  localized  to  the  module.  Therefore,  P^g  * 100%. 

[5]  For  the  Module  Level  BIT  fault  detection  and  faults  localization  are 
synonymous  functions. 

[6]  Failure  rate  of  memory  module  reduces  due  to  the  use  of  error  correcting 
code. 
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the  performance  figures  several  assumptions  were  made.  These  assumptions 
are  noted  in  Table  B.6.  The  probability  of  fault  detection  (P$po)  for 
the  modules  needs  some  clarification.  The  P^po  is  estimated  based  on 
the  percentage  of  hardware  functions  monitored  or  tested  by  the  BIT.  Esti- 
mates were  made  for  each  module  depending  on  the  complexity  of  its  subfunc- 
tions and  the  type  of  BIT  circuits  recommended  for  each  subfunction. 

The  composite  numbers  given  in  the  right-most  column  of  Table  8.6  in- 
dicates the  overall  effectiveness  of  the  Module  Level  BIT  in  the  single 
processor  system  configuration. 

8 . 3 Evaluation  of  the  Chassis  Level  BIT 

The  Chassis  Level  BIT  as  described  in  the  previous  sections  consist  of 
m ;croorotessor  based  intelligence  which  resides  in  a separate  module  within 
the  chissfs.  It  interfaces  with  a simple  maintenance  panel  on  the  chassis 
and  is  responsible  for  testing  all  modules  within  the  chassis. 

For  the  purpose  of  estimating  hardware  costs  it  is  assumed  that  such  a 
tester  can  be  implemented  using  hardware  which  is  equivalent  in  complexity 
to  a Oft  ISl-U  microcomputer  with  4K  words  of  memory  to  store  the  diagnos- 
tic software.  Table  8.7  lists  the  hardware  cost  for  Chassis  Level  BIT 
using  the  LSI-11  microcomputer.  The  details  of  the  components  used  and 
failure  rate  analysis  are  given  in  Appendix  C.  The  power  dissipation  was 
obtained  from  the  DEC  Logic  Handbook  T27]. 

The  cost  impact  of  the  Chassis  Level  BIT  on  each  chassis  differs 
slightly  depending  on  the  module  complement  within  the  chassis.  The  lower 
half  of  Table  8.7  shows  the  percent  increases  in  space,  power,  and  failure 
rate  due  to  the  Chassis  Level  BIT  for  each  type  of  chassis. 

For  the  Chassis  Level  BIT,  diagnostic  software  must  be  developed.  The 
type  of  diagnostic  test  routines  required  for  each  chassis  will  depend  on 
the  module  complement  within  the  chassis.  It  is  assumed  that  very  simple 
diagnostic  routines  will  be  used  at  the  chassis  level  to  perform  functional 
tests  on  each  module.  An  estimate  was  made  of  the  type  of  tests  required 
for  each  module  and  the  number  of  instructions  per  test.  From  this  the 
execution  time  required  per  test  was  estimated.  The  details  of  these  esti- 
mates are  given  in  Appendix  C.  The  results  are  summarized  in  Table  8.8. 
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TABLE  8.7.  HARDWARE  COST  ESTIMATION  FOR  CHASSIS  LEVEL  BIT 


BIT 

Hardware 

Number 
of  Chips/ 
Components 

Maximum 

Power 

Dissipation 

Failure 

Rate 

/ 1 0®  hrs. 

Number 

Of 

Modules 

Single  board  micro 
computer  (LSI  11  or 
equivalent)  with 

A KW  random  access 
memory 

70 

25.2  W 

29.9 

1 

Chassis  Maintenance 

Panel  with  switches 
and  indicators 

9 

1.6  W 

2.67 

Total 

79 

26.8  W 

32.6 

1 

Percent  Increase:  [1] 

A 

P 

FR 

A* 

Main  Computer  Chassis 

No.  1 

8.7% 

5.5% 

4.9% 

9.1% 

Memory  Expansion  Chassis 

8.4% 

5.8% 

4.3% 

I/O  Expansion  Chassis 

15% 

7.4% 

8.3% 

7.7% 

Notes: 

[1]  Percent  Increase  for  space  (A),  power  (P)  and  failure  rate  (FR)  are 

I calculated  as  follows: 

A=  f of  chips  for  Chassis  Level  BIT/chassis 
Total  # of  chips/chassis 

A*«  I of  modules  for  Chassis  Level  BIT/chassis 
Max.  # of  modules/chassis 

P*  Max,  power  dissipation  for  Chassis  Level  BIT/chassis 
Max.  power  dissipation/chassis 

FR=>  Failure  rate  for  Chassis  Level  BIT/chassis 
Total  failure  rate/chassis 


Main  Computer 
Chassis  No.  1 


Memory  Expansion 
Chassis 


Diagnostic 

Test 


Memory  Test 
Basic  CPU  Test 
BEM  Test 
BIM2  Test 
IOMOD  Test 
PCM2  Test 


BEM  Test 
Memory  Test 
PCM2  Test 


BEM  Test 
BIM2  Test 

IOMOD  Test 
PCM2  Test 


Number  of 
Instructions 

m 

. Max. 
Execution 
Time 
[2] 

1.5x150 

1x250 

1.5x100 

1x100 

2.5x100 

2x1.4  sec 
1x2.5  msec 
2x1.0  msec 
1x1.0  msec 
4x1.0  msec 

975 

2.8  sec 

1x100 

3.5x150 

1x1.0  msec 
6x1.4  sec 

625 

8.4  sec 

1x100 

1x100 

5.5x100 

1x1.0  msec 
1x1.0  msec 
10x1.0  msec 

750 

12.0  sec 

[1]  Total  Number  of  Instructions5  (N+l ) x Number  Instructions  per  Test 
Where  N=  Number  of  Modules.  2 

[2]  Maximum  Execution  Time5  N x Execution  Time  per  Test 
Where  N=  Number  of  Modules. 

[3]  Estimation  includes  the  fault  detection  capability  at  module  level. 

[4]  Total^Pjpg  calculated  for  a single  processor  computer  system  given 

Total  P ccq=  E (P<.gD  ^or  t^ie  mo(lu^e  x percent  distribution  of 
failure  rate  for  that  type  of  module). 

For  detailed  analysis  on  diagnostic  software  requirements  refer  to 
Appendix  C. 


In  Table  8.8  the  number  of  instructions  and  execution  time  per  test 
have  been  multiplied  by  a factor  which  takes  into  account  the  number  of 
modules  using  the  same  test  software.  The  probability  of  fault  detection 
(P$Fd)  includes  the  fault  detection  capability  of  the  Module  Level  BIT. 

This  is  because  the  Chassis  Level  BIT  will  exercise  the  modules  using  cer- 
tain standard  test  patterns  while  the  Module  Level  BIT  will  also  be  simul- 
taneously checking  the  module  hardware. 

Finally,  the  performance/cost  figures  for  the  Chassis  Level  BIT  are 

summarized  in  Table  8.9.  The  hardware  and  software  cost  figures  are  taken 

directly  from  Tables  8.7  and  8.8.  The  assumptions  made  in  estimating  the 
BIT  performance  are  stated  in  the  notes  in  Table  8.9.  The  composite  num- 
bers indicate  the  overall  effectiveness  of  the  Chassis  Level  BIT  for  the 

single  processor  system  configuration. 

8.4  Evaluation  of  the  System  Level  BIT 

The  System  Level  BIT  consists  mainly  of  software  routines  to  test  the 
hardware  in  the  whole  system.  This  includes  all  modules,  chassis,  periph- 
erals and  their  interconnecting  buses.  Such  software  is  generally  very 
extensive  and  runs  under  a diagnostic  operating  system  in  an  off-line  mode. 
There  are  several  categories  of  software  diagnostic  programs  such  as  system 
exercisers,  subsystem  (module  in  MCF  context)  exercisers,  reliability 
tests,  etc  Such  tests  typically  require  large  amounts  of  memory  and  nor- 
mally ref.'.,  on  external  mass  storage  devices.  It  is  difficult  to  evaluate 
the  impact  cf  additional  hardware  BIT  on  such  off-line  diagnostic  software 
for  several  reasons.  First,  the  hardware  BIT  is  mainly  geared  towards  on- 
line fault  monitoring.  Second,  the  specific  diagnostic  software  code  de- 
pends on  the  implementation  of  the  hardware  logic  rather  than  the  hardware 
F^  specifications.  Third,  at  this  time  only  the  architecture  and  not 
the  detailed  specifications  for  the  MCF  software  are  available.  Although 
the  AN/UYK-41(V)  emulates  the  PDP-11/70  such  that  the  time  independent 
software  written  for  the  PDP-11/70  would  be  transferable  to  the  AN/UYK-41 
(V),  it  does  not  mean  that  the  DEC  PDP-11/70  diagnostic  software  can  be 
meaningfully  run  on  the  AN/UYK-41 ( V ) . 

In  view  of  the  above  considerations,  the  evaluation  of  the  System 
Level  BIT  has  been  restricted  to  software  for  the  microdiagnostics  and  that 
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TABLE  8.9.  SUMMARY  OF  PERI ORMANCE/COST  FIGURES 
FOR  CHASSIS  LEVEL  BIT 


Parameters 

Main  Computer 
Chassis  No.  1 

Memory 

Expansion 

Chassis 

I/O 

Expansion 

Chassis 

Compos i te 

m 

PSFD 

76% 

86.9% 

79.8% 

81.4% 

IQ18I 

1.4  sec 

4.2  sec 

6.0  msec 

4.2  sec 

4.9% 

4.3% 

8.3% 

5.4% 

PLFE  t4] 

100% 

100% 

100% 

100% 

TLFE  [2] 

1.4  sec 

4.2  sec 

6.0  msec 

4.2  sec 

A 

9% 

8.7% 

16.1% 

10.5% 

p 

5.5% 

5.8% 

7.4% 

6.1% 

FR 

4.9% 

4.3% 

8.3% 

5.4%  . 

OS 

Not  Appl icable 

N.A. 

AS 

N.A. 

DS 

975 

Instructions 

625 

Instructions 

750 

Instructions 

783 

Instructions 

Notes: 


[1]  Composite  figures  are  calculated  for  a single  processor  computer 
system  given  in  Table  8.3. 

Composite  P$fo  E ( n of  the  chassis  x percent  distribution  of 
failure  rate  for  that  type  of  chassis) 


Composite  Tsfq*  T^*  Maximum  of  the  three  times. 

Composite  PFA  * l ( Pp*  of  the  chassis  x percent  distribution  of 
failure  rate  for  that  type  of  chassis) 


r 


i 


TABLE  8.9.  CONTINUED 

Composite  A=  z (A  of  the  chassis  x percent  distribution  of  number 
of  chips  for  that  type  of  chassis) 

Composite  P=  i (P  of  the  chassis  x percent  distribution  of  power 
dissipation  for  that  type  of  chassis) 

I Composite  FR=  z (FR  of  the  chassis  x percent  distribution  of 

failure  rate  for  that  type  of  chassis) 

Composite  DS=  Average  of  the  number  of  instructions 

[2]  It  is  assumed  that  on  the  average  the  time  to  detect  and  the  time  to 
localize  the  faults  are  the  same  because  chassis  level  testing  is 
predominantly  off-time. 

T|_p£  ■ T^pD  = 1/2  x Max.  Total  Execution  Time 

[3]  It  is  assumed  that  the  false  alarms  occur  due  to  the  failure  of  the 
Chassis  Level  BIT.  Furthermore,  it  is  assumed  that  every  Chassis 
Level  BIT  failure  result  in  a false  alarm.  Therefore,  PpA  = FR. 

[4]  It  is  assumed  that  the  Chassis  Level  BIT  will  test  only  one  module 
at  a time.  Therefore,  Plfe  = 1002. 


portion  of  the  macrodi agnostics  that  can  be  used  for  on-line  or  idle  time 
testing. 

Table  8.10  shows  estimates  of  the  microdiagnostic  software  require- 
ments. Several  test  routines  were  considered  which  check  main’v  the  hard 
core  CPU  logic  and  the  accessibility  of  a portion  of  the  memory  where 
macrodiagnostics  reside.  The  estimate  for  the  execution  time  of  the  micro- 
diagnostics includes  several  passes  through  the  tests.  The  hardware  re- 
quirements for  the  additional  micromemory  have  already  been  included  in  the 
hardware  cost  estimates  of  the  Module  Level  BIT  for  the  CPU3  module. 

The  macrodiagnostics  that  are  necessary  to  test  a particular  computer 
system  are  largely  influenced  by  the  actual  hardware  implementation.  This 
is  true  of  a computer  system  whether  or  not  it  is  designed  with  BIT.  It  is 
beyond  the  scope  of  this  report  to  quantify  the  software  necessary  to  per- 
form diagnostics  on  the  MCF  functional  modules.  Therefore,  meaningful 
evaluation  of  the  impact  that  built-in-test  would  have  on  the  system  level 
diagnostic  software  has  not  been  included  in  this  report.  Table  8.11  shows 
the  cost  and  performance  estimates  for  the  system  level  BIT  that  is  con- 
tained in  the  microdiagnostic  control  store. 

8 . 5 Summary  of  the  Performance/Cost  Evaluation 

The  performance/cost  evaluation  was  done  for  the  recommended  built-in- 
tests at  the  module,  chassis,  and  system  levels.  The  performance  and  cost 
parameters  were  evaluated  with  respect  to  an  MCF  AN/UYK-41 ( V ) single  pro- 
cessor system  configuration  with  256K  words  of  memory  and  14  MCF  I/O  chan- 
nels. The  results  of  the  BIT  effectiveness  evaluation  are  summarized  ir 
Table  8.12. 

From  this  table  it  can  be  seen  that  the  previously  described  BIT 
provides  a high  level  of  performance  at  a moderate  cost.  The  probability 
that  a fault  will  be  detected  by  one  of  the  three  levels  of  BIT  is  esti- 
mated to  be  around  9U%.  Not  only  will  faults  be  detected  within  about  4 
seconds,  but  every  fault  that  is  detected  will  be  localized  to  the  faulty 
module  within  the  same  period  of  time.  The  probability  of  the  BIT  system 
indicating  a module  is  faulty  when  it  is  not  faulty  is  estimated  to  be  less 
than  5%.  The  cost  of  this  capability  has  been  estimated  in  terms  of  space. 
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TABLE  8.10.  MICRODIAGNOSTIC  SOFTWARE  REQUIREMENTS 
FOR  THE  SYSTEM  LEVEL  BIT 


Diagnostic 

Test 

Number  of 
Microinstruction 

Number 
of  Passes 

Maximum 

Execution  Time  [1] 

Microsequencer 

Test 

100 

8 

160 

usee 

Register  Test 

150 

8 

240 

usee 

ALU  Test 

400 

8 

640 

usee 

Control  Panel 

200 

8 

320 

Msec 

Test 

MCE  Bus 

300 

32 

1920 

Msec 

Controller  Test 

Memory  Test 

250 

32 

1600 

usee 

Total 

1150 

4.88 

Msec 

Notes : 


[1]  Maximum  execution  time1  Number  of  instruction  x Number  of  passes  x 

Average  microinstruction  execution  time 

Average  microinstruction  execution  time  assumed  to  be  200  nanoseconds 


TABLE  3.11.  SUftlARY  OF  PERFORMANCE/COST 

ESTIMATES  FOR  THE  SYSTEM  LEVEL  BIT 


Parameters 

Micro- 

Diagnostics 

p 

SFD 

9OT 

tSFD 

2.44  msec  [l] 

Pfa 

0.15X  [2] 

PLFE 

lOOX 

tlfe 

2.44  msec  [1] 

A 

[3] 

P 

[3] 

FR 

[3] 

OS 

100  Instructions 

AS 

Not  Applicable 

OS 

1150  instructions 

Notes: 

[1]  It  is  assumed  that 

TlFE  e tSFD  = x Max.  Total  Execution  Time  for  the 
Microdiagnostics. 

[2]  It  is  assumed  false  alarms  occur  due  to  failure  of  the  microdiagnostic 
control  store.  Furthermore,  it  is  assumed  that  every  diagnostic 
control  store  failure  results  in  a false  alarm.  Therefore, 

P fn  = FR  of  the  microdiagnostic  control  store. 

[3]  Cost  of  microdiagnostic  hardware  Is  included  in  the  Module  Level  BIT 
for  the  CPU3  module. 
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TABLE  8.12.  SUMMARY  OF  TERFORMANCE/COST  ESTIMATES 

FOR  THE  MODULE,  CHASSIS  ANO  SYSTEM  LEVEL 
BITS  FOR  A SINGLE  PROCESSOR  COMPUTER  SYSTEM 


Module 

Chassis 

mm 

Level 

Level 

mfWm 

Parameters 

BIT 

BIT 

m 

PSFD 

65. 4* 

81.4%  [1] 

90%  [1] 

tsfd 

1-2  usee 

4.2  sec 

2.44  msec 

pfa 

OS 

5.4% 

0.  15% 

PLFE 

O 

o 

3* 

100% 

100% 

tlfe 

1-2  psec 

4.2  sec 

2.44  msec 

. A 

21. 5% 

10.5% 

[2] 

P 

16.9% 

6.1% 

[2] 

FR 

-13.3%  [3] 

5.4% 

[2] 

OS  [4] 

N.  A. 

N.  A. 

100 

AS 

N.  A. 

N.  A. 

N.A. 

OS  [4] 

N.  A. 

783 

1150 

Notes: 

[1]  Pern  at  chassis  and  system  level  include  the  fault  detection 
capability  of  the  Module  Level  BIT. 

[2]  Cost  of  microdiagnostic  hardware  is  included  in  the  Module 
Level  BIT  for  the  CPU3  module. 

[3]  Negative  failure  rate  Increase  is  due  to  the  use  of  error 
correcting  code  in  the  memory  subsystem. 

[4]  Number  indicate  number  of  additional  assembly  language  level 
instructions. 
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power,  and  failure  rate  increase.  The  space  and  power  needed  is  about  25% 
more  than  the  example  system  without  BIT.  However,  due  to  the  increased 
reliability  of  error-correcting  code  on  the  memory  modules,  the  reliability 
of  the  total  system  with  BIT  is  about  8%  better  than  the  same  system  with- 
out BIT.  The  impact  on  application  programs  is  minimal  and  the  added  cost 
of  diagnostic  and  operating  system  software  is  predicted  to  be  quite  small. 


9.0  SUMMARY  AND  RECOMMENDED  FURTHER  WORK 


1 


The  preceding  sections  of  this  report  have  discussed  built-in-test  as 
a means  of  accomplishing  the  Military  Computer  Family  maintenance  goals  of 
1)  continuous  system  monitoring  and  indication  of  malfunction,  2)  diagnosis 
of  system  malfunction  to  a module  level  with  a low  probability  of  a false 
module  pull,  and  3)  measurement  and  recording  of  module  elapsed  time  when  a 
module  is  pulled.  The  approach  taken  in  this  study  was  to  assume  a fault 
population,  predict  where  in  the  system  these  faults  are  most  likely  to 
occur  and  develop  a rationale  for  deploying  built-in  fault  detection  and 
localization  resources  accordingly. 

In  the  case  of  the  AN/UYK-41(V)  ( POP- 11/70}  configuration,  at  room 
temperature  approximately  60%  of  all  faults  will  occur  in  the  memory,  3U% 
will  accrue  in  the  CPU  and  10%  will  exist  in  the  remainder  of  the  system. 

To  detect  and  localize  these  faults  with  maximum  probability  of  detection 
(P$fd)  an<1  minimum  additional  maintenance  hardware  and  software,  a 
rationale  was  developed  to  show  module,  chassis  and  system  level  BIT  should 
be  used  and  that  fault  reporting  would  be  from  the  lowest  to  the  highest 
level  of  complexity.  It  was  concluded  using  this  approach  that  up  to  80% 
of  all  faults  can  be  detected  with  10%  additional  hardware,  80-90%  with  2u% 
more  hardware  and  90-95%  of  all  faults  can  be  detected  with  30%  more  hard- 
ware. To  detect  the  remaining  5-10%  and  to  minimize  the  probability  of 
returning  to  vendor's  modules  which  are  not  faulty,  an  on-site  module/ 
chassis  off-line  tester  is  recommended. 

Since  the  ultimate  outcome  of  this  study  will  be  amended  MCF  AN/UYK- 
41 ( V ) form,  fit  and  function  specifications  with  built-in-test,  the  role  of 
BIT  at  each  level  was  characterized  in  terms  of  flexibility,  intelligence, 
observability  and  operator  accessibility.  Various  fault  reporting  schemes 
were  evaluated  with  the  requirement  for  minimizing  false  module  pulls  t je 
to  faulty  checkers  in  mind.  A summary  of  BIT  functions  at  each  level  is 
given  in  Table  4.2. 


During  the  course  of  this  preliminary  study  of  MCF  built-in-test 
requirements  and  alternatives,  a number  of  issues  were  identified  which 
were  beyond  the  scope  of  the  present  effort  but  which,  nevertheless  deserve 
some  comment  in  the  present  work.  The  remainder  of  this  section  will  be 
devoted  to  a discussion  of  these  issues. 

9.1  Fault  Tolerant  Aspects  of  MCF 

Central  to  the  MCF  maintenance  philosophy  discussed  in  this  report  is 
the  idea  of  on-line  fault  detection  and  isolation,  and  off-line  manual  re- 
pair through  module  replacement.  This  study  has  not  specifically  addressed 
the  incorporation  of  fault  tolerant  computing  techniques  which  lead  to 
automatic  fault  detection,  isolation,  and  on-line  reconfiguration.  While 
the  emphasis  of  this  study  was  not  on  fault  tolerant  computing  as  opposed 
to  fault  detection,  isolation  and  repair,  techniques,  it  does  not  mean  that 
fault  tolerance  should  be  overlooked.  It  is  recommended  that  in  future 
studies  emphasis  be  given  to  fault  tolerant  computing  approaches.  In  par- 
ticular, fault  tolerant  multi -processor  hardware  structures  should  be  in- 
vestigated. 


9.2  On-Line  BIT  Relationship  to  Off-Line  Automatic  Test  Equipment 
(ATE) 

The  present  study  has  primarily  addressed  the  question  of  what  can  be 
done  with  on-line  fault  monitoring  facilities  assuming  a two-level  main- 
tenance philosophy.  The  study  did  not  specifically  consider  tradeoffs  be- 
tween BIT  and  ATE  from  a system  life  cycle  cost  (LCC)  standpoint.  However, 
the  study  did  recommend  the  use  of  some  kind  of  off-line  (but  on-site)  test 
facilities  for  minimizing  the  probability  of  returning  non-faulty  modules 
to  vendors. 

9.3  Implications  of  Future  Very  Large  Scale  Integrated  (VLSI) 
Circuitry 

Very  large  scale  integrated  circuitry  will  make  possible  the  realiza- 
tion of  even  more  complex  functions  within  the  bounds  of  the  MCF  form  and 
fit  specifications.  One  result  of  this  could  be  the  ability  to  include 
massive  hardware  redundancy  on  each  module.  Another  consequence  could  be 
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multiprocessor  hardware  structures  at  very  low  cost  which  can  effectively 
emulate  prior  generation  machine  while  at  the  same  time  offer  the  fault 
tolerant  benefits  of  inherently  redundant  structures. 

9*4  Register  Transfer  Level  BIT  Simulations 

Future  work  on  BIT  for  MCF  should  include  regi ster- transfer  (R-T) 
level  simulations  of  MCF  members  with  and  without  proposed  BIT  appraoches. 
Such  simulations  would  be  useful  for  verifying  fault  detection  effective- 
ness and  for  predicting  the  amount  of  additional  circuitry  required  for 
BIT. 
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active  test.  One  which  generates  its  own  test  vectors,  closed  loop  test. 
automatic  test  equipment  (ATE).  Equipment  that  is  designed  to  conduct 

analysis  of  functional  or  static  parameters  to  evaluate  the  degree  of 
performance  degradation  and  that  may  be  designed  to  perform  fault 
isolation  of  unit  malfunctions.  The  decision  making,  control,  or 
evaluative  functions  are  conducted  with  a minimum  reliance  upon  human 
intervention.  (MS1309B) 

availability.  A measure  of  the  degree  to  which  an  item  is  in  the  operable 
and  commitable  state  at  the  start  of  the  mission,  when  the  mission  is 
called  for  at  an  unknown  (random)  point  in  time.  (MS721B) 

Availability  is  the  probability  of  system  readiness  over  a long 
interval  of  time. 

built-in-test  (BIT).  A test  approach  using  BITE  or  self-test  hardware  or 
software  to  test  all  or  part  of  the  unit  under  test  (UUT).  (MS1309B) 
built-in-test  equipment  (BITE).  Any  device  which  is  part  of  an  equipment 
or  system  and  is  used  for  the  express  purpose  of  testing  that  equip- 
ment or  system.  BITE  is  an  identifiable  unit  of  the  equipment  or 
system.  (MS13U9B) 

casualty.  A manifestation  of  a failure  at  the  system  level  or  major  sub- 
system level  such  that  the  system/subsystem  is  incapable  of  performing 
its  principal  function(s).  A casualty  is  differentiated  from  a mal- 
function by  the  greater  seriousness  or  persistence  of  its  nature. 
catastrophic  fault,  analog.  A fault  in  analog  circuitry  which  causes  a 

sr'.den  change  in  operating  characteristics  which  results  in  a complete 
lack  of  useful  performance.  (ATG) 

catastrophic  fault,  digital.  A primary  failure  in  digital  circuitry  which 
couses  secondary  failures. 

closed-loop  testing.  Testing  in  which  the  input  stimulus  is  controlled  by 
the  equipment  output  monitor.  ( MSI 309B ) 
confidence  test.  A go/no-go  test. 

controllability.  An  attribute  of  equipment  design  which  defines  or 

describes  the  extent  to  which  signals  of  Interest  may  be  observed. 
critical  failure.  A failure  which  results  in  a casualty. 
diagnostic  test.  A test  designed  to  perform  fault  isolation. 
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disruptive  test.  One  which  destroys  (changes)  the  state  of  the  hardware  or 
software. 

dynamic  test.  A test  of  one  or  more  of  the  signal  properties  or  charac- 
teristics of  an  equipment  or  any  of  its  constituent  items  performed 
such  that  the  parameters  being  observed  are  measured  and  assessed  with 
respect  to  a specified  time  aperture  or  response.  (AT6) 
error.  Any  discrepancy  between  a computed,  observed,  or  measured  quantity 
and  the  true,  specified,  or  theoretically  correct  value  or  condition. 
(IEEE) 

external  ATE.  ATE  which  is  physically  separated  from  the  unit  under  test 
when  the  UUT  is  in  its  operational  environment. 
failure.  The  termination  of  the  ability  of  an  item  to  perform  its  required 
function.  (IEEE)  A failure  is  the  functional  manifestation  of  a 
fault. 

failure  analysis.  The  logical,  systematic  examination  of  an  Item  or  its 
diagrain(s)  to  identify  and  analyze  the  probability,  causes,  and 
consequences  of  potential  and  real  failures.  (MS721B) 
failure  mode.  A failure  classification. 

failure  universe/failure  population.  The  failures  which  correspond  to  a 
selected  fault  population.  This  is  used  as  a basis  for  the  design 
and  evaluation  of  tests. 

false  alarm.  An  indicated  fault  where  no  fault  exists.  (MS1309B) 
false  alarm  rate.  The  frequency  of  occurrence  of  false  alarms. 
fault.  A physical  condition  that  causes  a device,  component,  or  element  to 
fail  to  perform  in  a required  manner;  for  example,  a short-circuit  or 
a broken  wire.  ( IEEE) 

fault  coverage/fai 1 ure  coverage.  An  attribute  of  a test  or  test  procedure 
expressed  as  the  percent  of  faults  of  the  failure  population  which 
that  test  or  test  procedure  will  detect. 
fault  detection.  A process  which  discovers  or  is  designed  to  discover  the 
existence  of  faults;  the  act  of  discovering  existence  of  a fault. 
fault  dictionary.  A list  of  elements  where  each  element  consists  of  a test 
and  all  the  faults  detected  by  that  test.  (IEEE/FTC)  Often  only  the 
LRUs  which  contain  the  faults  are  listed. 
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fault  isolation.  Where  a fault  is  known  to  exist,  a process  which 
identifies  or  is  designed  to  identify  the  location  of  that  fault 
within  a small  number  of  relaceable  units. 
fault  localization.  Where  a fault  is  known  to  exist,  a process  which 
identifies  or  is  designed  to  identify  the  location  of  that  fault 
within  a general  area  of  equipment.  Fault  localization  may  be  less 
specific  than  fault  isolation. 

fault  population.  The  totality  of  faults  which  may  be  incurred  by  a 
device. 

fault  prediction.  A process  used  to  predict  that  some  component  will  be 
out  of  tolerance  before  the  next  scheduled  maintenance  period  based 
upon  the  present  measurement  of  component  parameters. 
fault  signature.  An  output  test  vector  resulting  from  the  testing  of  a 
unit  containing  one  or  more  faults. 

fault  tolerance.  The  capacity  of  a computer,  subsystem,  or  program  to 
withstand  the  effects  of  internal  faults;  the  number  of  error- 
producing  faults  a computer,  subsystem,  or  program  can  endure  before 
normal  functional  capability  is  impaired.  (IEEE/FTC) 
functional  fault.  A fault  which  can  be  described  by  a change  in  function 
of  some  Identifiable  portion  of  a system.  (IEEE/FTC)  A failure. 
functional  modularity.  The  splitting  of  a system  into  parts  or  modules 
based  on  the  function  or  purpose  of  these  parts.  (1EEE/FTC) 
functional  partitioning.  The  physical  or  electrical  separation  of  system 
elements  along  interfaces  which  define  and  isolate  these  elements  on 
bases  of  function  or  purpose. 

functional  test.  A test  which  is  intended  to  exercise  an  identifiable 

function  of  a system.  (IEEE/FTC)  The  function  is  tested  independent 
of  the  hardware  implementing  the  function. 
go/no-go  test.  A test  designed  to  yield  a "test  pass"  or  "go"  indication 
in  the  absence  of  faults  in  a UUT,  and  a "test  fail"  or  "no-go" 
indication  in  the  presence  of  fault(s). 
hard  core.  That  kernel  of  circuitry  in  a processor  system  which  must 
be  functioning  properly  in  order  for  that  processor  or  system  to 
successfully  execute  tests  of  other  portions  of  itself. 
hard  core  failure.  A failure  in  the  hard  core  logic  of  a system  which 
inhibits  normal  self-test  of  the  system. 
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idle  test  timp.  Which  occurs  when  the  system  (or  resource)  is  idling. 
initial ize.  (1)  To  establish  an  initial  conQttion  or  starting  state;  for 
example,  to  set  logic  elements  in  a digital  circuit  or  the  contents  of 
a storage  location  to  a known  state  so  that  subsequent  application  of 
digital  test  patterns  will  drive  the  logic  elements  to  another  known 
state;  and  (2)  to  set  counters,  switches,  and  addresses  to  zero  or 
other  starting  values  at  the  beginning  of,  or  at  prescribed  points  in, 
a computer  routine.  (MS13U9B) 
input  test  vector.  A test  pattern. 

interfering  test.  One  which  degrades  the  performance  of  the  On-Line  system 
(or  resource)  operation. 
intermittent  fault.  A temporary  fault.  (IEEE) 

inverted-pyramid/building-block.  Descriptive  terms  characterizing  a test 
or  test  technique  whereby  the  smallest  possible  portions  of  hardware 
are  tested  first  in  the  test  sequence,  and  subsequent  tests  utilize 
previously  verified  hardware  for  execution. 
latent  fault  time.  The  extent  or  duration  of  time  during  which  an  existing 
fault  is  undetected;  the  elapsed  time  between  fault  occurrence  and 
fault  detection. 

line  replaceable  unit  (LRU).  A unit  which  is  designated  by  the  plan  for 
maintenance  to  be  removed  upon  failure  from  a larger  entity  (equip- 
ment, system)  in  the  latter's  operational  environment.  (MS1309B) 
maintainability.  A characteristic  of  equipment  design  and  installation 
which  is  expressed  as  the  probability  that  an  item  will  be  retained 
in  or  restored  to  a specified  condition  within  a given  period  of  time, 
when  the  maintenance  is  performed  in  accordance  with  prescribed  pro- 
. cedures  and  resources.  IMS721B) 

mal function.  An  error. 

marginal  fault.  A failure  such  that  some  equipment  function  is  impaired  or 
out  of  tolerance  and  is  of  a nature  such  that  catastrophic  failure 
does  not  occur. 

mean-time-between-maintenance  (MTBM).  The  mean  of  t:.e  distribution  of  the 
time  intervals  between  maintenance  actions  (either  preventive,  correc- 
tive, or  both).  (MS721B)  Includes  actions  due  to  false  alarms. 
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mean-time-to-i sol  ate.  The  average  time  required  to  achieve  fault  isolation 
as  measured  from  the  time  of  fault  detection  to  the  time  of  fault 
i solation. 

mean-time-to-local ize.  The  average  time  required  to  achieve  fault  locali- 
zation as  measured  from  the  time  of  fault  detection  to  the  time  of 
fault  localization. 

mean-time-to-repai r (MTTR).  The  total  corrective  maintenance  time  divided 
by  the  total  number  of  corrective  maintenance  actions  during  a given 
period  of  time.  (MS721B) 

multiple  failure.  A joint  occurrence  of  two  or  more  single  failures. 

(IEEE) 

non-di sruptl ve  test.  One  which  does  not  destroy  (change)  the  state  of  the 
n a rdw a re  or  software. 

non-interfering  test.  One  which  does  not  degrade  the  performance  of  the 
on-line  system  (or  resource)  operation. 

observabil ity.  An  attribute  of  equipment  design  which  defines  or  describes 
the  extent  to  which  signals  of  interest  may  be  observed. 

off-line  test.  One  which  occurs  when  the  system  (or  resource)  is  off-line. 

on-line  test.  One  which  occurs  concurrent  with  the  on-line  operation  of 
the  system  (or  resource)  on-line  tests  are  either  continuous  or 
sampled  (which  occur  periodically). 

passive  test.  One  which  does  not  generate  its  own  test  vectors,  open  loop 
test. 

periodic  test.  Off-line  test  which  occurs  periodically. 

random  fault/random  failure.  An  intermittent  fault  whose  occurrence  is 
predictable  only  in  a statistical  sense. 

readiness.  A state  of  being  ready  to  successfully  perform  or  being  in  the 
act  of  successfully  performing  a defined  mission. 


readiness  test.  A test  specifically  designed  to  determine  whether  an 
equipment  or  system  is  operationally  suitable  for  a mission. 
(MS1309B) 
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reconfiguration.  A repair  strategy  in  which  failing  components  are 

switched  out  of  operation  and  replaced  by  failure-free  components. 
(IEEE/FTC) 

recovery.  The  continuation  of  system  operation  with  error-free  data  after 
an  error  occurs.  (IEEE/FTC) 

redundance,  redundancy.  The  introduction  of  auxiliary  elements  and  com- 
ponents into  a system  to  perform  the  same  functions  as  other  elements 
in  the  system  for  the  purpose  of  improving  reliability  and  safety. 
(IEEE)  Also,  the  use  of  additional  components,  programs,  or  repeated 
operations,  not  normally  required  by  the  system  to  execute  its 
specified  tasks,  to  overcome  the  effects  of  failures.  (IEEE/FTC) 
repeatabil ity . A test  characteristic  such  that  repeated  application  of  a 
given  set  of  stimuli  to  a UUT  yields  identical  results. 
resource  (module)  on-line.  When  it  ij>  being  used  by  the  system. 
resource  (module)  off-line.  When  it  is  not  being  used  by  the  system  and  is 
not  scheduled  to  be  used. 

resource  (module)  idling.  When  it  is  not  being  used  by  the  system  but  is 
scheduled  to  be  used. 
solid  fault.  A permanent  fault.  (IEEE) 

stuck  fault/stuck  failure.  A failure  in  which  a digital  signal  is 
permanently  held  in  one  of  its  binary  states.  (IEEE) 
symptom.  The  manifestation  or  evidence  of  a particular  failure  condition. 
system  on-line.  When  some  of  the  resources  (modules)  are  being  used  for 
the  applications. 

system  off-line.  When  none  of  the  resources  (modules)  are  being  used  for 
the  application  and  none  are  scheduled  to  be  used. 
system  idling.  When  none  of  the  resources  (modules)  are  being  used  for  the 
application  but  some  are  scheduled  to  be  used. 
test.  A procedure  or  action  taken  to  determine  under  real  or  simulated 
conditions  the  capabilities,  limitations,  characteristics, 
effectiveness,  reliability,  or  suitability  of  a material,  device, 
system,  or  method.  (MS13U9B) 
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test  pattern.  A simultaneous  or  parallel  definition  of  all 
system.  (IEEE/FTC) 

test  point.  A node  within  a circuit  or  system  which  can  be  measured  or 
stimulated  to  facilitate  testing. 

transient  failure.  A failure  induced  by  a momentary  or  temporary  external 
factor  such  as  Input  power  fluctuation,  excessive  ambient  temperature 
excursion,  electromagnetic  interference,  or  by  factors  internal  to  a 
system.  A solid  fault  may  cause  a transient  failure. 


A O 


i 


riill'o.Ktl  i SI 


JBIS  PAGE  IS  UES I QUALIIX  PfiACIICAUjl 
TOO*  OttrJf  rvkVkLSHBD  10  flDQ 


000  0 • 16.000  L - 1.000  T • RS.000 


Ui  1 

.M 

on 

1 n fx  o >•  CM  c 

n 

t-  1 

u 

1 >c  o < 

0 rx  M Pi 

P4 

C 1 

» 

►- 

1 rx  «r  < 

0 ▼ »X  O 

▼ 

<x  i 

c 

k*» 

SC 

i rx  os 

- < *■ 

Ui  I 

a:  i 

V» 

-o  -<  -o 

© <r  ~ 

o*  - 

o n ti  •«  n « t 

t\  O «-• 

K)  23 

0-  O'  > 

O' 

l O'  *0 

« n m 

o 

N 

>o  in  03 

a o«  ri 

O-  o 

O < "UI  > N ^ 

n n n 

i**i  «■  O 

O'  O'  o* 

o- 

o 

UJ 

1 P4  *• 

> S3 

O' 

5 i 

O 

«r  ▼ cm 

f 4 O 55 

O — 

n o fj  a s o > 

n - > a 

no-  o 

in  to  sn 

« 

o 

cc 

» 

O'  O' 

o 

s 

1 

«-  i 

-o  n o m rx 

CN  ^ 

Mil  - - O O H 

tn  n m 

n ~ n 

tx  rx  rx 

rx 

-J 

1 

« i 

firm 

«r  « 

^ C4 

n n » ▼ "0  * m 

m n rx 

«■  rx  rx 

r*  rx  rx 

rx 

n 

1 

U.  1 

t r t 

<r 

< 

1 

ti  C*  04  n 

o* 

"0 

u. 

1 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 


Cft  I O C CC  o « < * 

u.  i © — r*  fr.  — m — 

h-  c ^ < s > * 

((  i .*.  - < a N N 


u.  ' 
-.  I 


in 

u* 


SlSgS: 

- * * r * 


XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 


000*09  ■ 1 000*1  * *1  000*91  ■ 0 000 


XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-  " 'XXXXXXXXXXXXXXXXXXXXXXXXXX 


axxxxxxxxxxxxxx/xxvx/  <x>:x/x/xx  :>:v.x  o.x ^/.:/.xxxxxyyyvv  v''x\x<xyxxxx«xxx 


m 

-ce~n-»o 

«••••••••••• 

(N  — pjCN^n^m-crtri 

— « 


riO»o>o»-«n©oa-*o/)-* 

^ Mil  « <1  O O > O > » 

•flnNCB»--r.  »«-N 
’ w — -O  -•  -* 


T 4 CD  T O O O 

•«  fs.  (N  ^ * <0  *■ 
-OO-WNN 


>non-rt- 
> N f i « > N N 
O f J T > N - 


O ^ O O O O N 

/5  «o  o o o c ~ 

P4  >0  o o o o > 


9 

§ s 

o • 

• < 

«0  •+ 


M o 


ua«  ui  u < 

hZO  *■«  *- 

8 c »-  u o c 

• c Ui  0 9 

_ -i  J*  * -i  • 

(A  * O Ui  • *9 

Hfi  I Is 

• *»  a • a.  • ►-  • x 

5*  “.HI  5S  §53 
^-xs2- .S  u-*-i 
*si?=5g;5s§3 

• * 3 a ~ 3 * 
ij  . Ui  U • C ►“  • 3 • • K 
oi  y?  ‘-5  t>- jwrwo« 
<*32-C$03*3n  © 


k?  i N H - c £ ;i  1 

>-  i W W)  A 3 c 3 — 

t-  uc.ccx.a 


■ ■ 


IHIS  P*GE 

T»>*  0°^ 


/ I A * ft  I 


40.000 


4!l/, 

4V3, 


IBIS  PAGE  IS  BEST  QUALITY  PKAC.TICABijl 
jjpj  rufOkisHED  ro  'j£>c 


3 * » « r:  * = s «■  c > •„ 

TT-ir.  2 2 ~ £ :9--oo<<rN  ••  - 

p*  ' • N • < < o o c c -<crjr.ec *•  r «■  - 


»*•—*«  •r.  — — > 

f-  « ? - - f.  N 

© O O C > 0-  o 


i © -<  « o — > » » o-  r»  r.  ▼ rena 
i **J  **  .1  — s o : 

- <0  *r  > c fcl  Irt  a-  .*  ii  !S  < C r 


► - > - ^ V T 

Si? SSL  I 
|(CS?;jsi 


uiifii 

► C".  - C ' ■ 
*•  »•  <*N- 

r.  < •*  .*.  • 


■>-'->-  r * • t . sfie  » 

i i ii  -=:?=="!  : 
iiii  sis is=g«  = : 

---  52  = 33*31*] 


::-f  5'i  = 

•s  t.A  i : a 


I 


CPDP-1 1/70 
[DAT A. PATH 
SF  ,7400 
42F » 7404 
5F.7410 
1 IF » 7411 

2F » 7420 
IF, 7464 

34.74153 

9.74153 
2,74157 
3,741823 

CALU. CONTROL 
1 IF ,7400 
41F.7404 
26F ,7405 
13F.7410 
20F ,7411 
4F.7420 
1 IF ,7444 
2F.7450 
2F.7474 
2F.7430 
2F, 74112 

4.74153 
10,74158 
2,74174 

5 MM 

1,74288 
8,31013 
C IR . DECODE 
2-F , 7474 
45F , 7404 
2F.7440 
3F.7444 
2,74157 
5,8251 
23F , 7410 
7F, DM8881 
43F , 7400 
47F ,7411 
8F , 7444 
4F.7450 
10F , 7420 
4,74153 
5F.7465 
IF, 7405 
2.MK81923 


C CONSOLE 
19F , 7 400 
55F ,7404 
3F» DM8881 
12F , 74 1 7 
1 .DM93183 

[PROCESSOR . DATA . AND . UN  I BUS . REGS 
24,74153 
47F ,7404 
3F.7400 
12,74174 
4,7485 
2F.7440 
1 4F , 7474 
1 4F , DMS881 
13F ,7401 

2.74175 

• 1, DM9318 

7F.7410 
3F.7411 
2F.7420 
2F.7450 
1,74157 
14F, DM84403 

CMICROSEGUENCER 
12,74174 
17,3401 
47F, 7404 
9F.7411 
28F , 7474 

1.74175 
37F , 7444 
33F ,7410 
10F , 7400 
4F.7420 
2F.7440 
4,741533 

CTIMING. GENERATOR 
4,7440 
8,74112 
5,7404 
7,7474 
5,7400 
2,7445 
4,7444 
2,7411 
2,74140 
2,74203 
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i ikar.  anii.mso .nun  mil. 

L CACHE .CON  FROI. 

6. 7400 

3.7450 

12.7101 

4.74157 

’.7110 

8,74151 

19.7411 

4,71153 

!) » 7470 

1.7443  . 

1 . 7130 

1.7437 

3.7110 

1,74133 

1.7  150 

13.7400 

4.7161 

7,7404 

4.7174 

A, 7410 

4.74113 

10,7411 

1.74153 

3,7420 

3.71171 

11,7464 

l.liMa640 

13,7474 

3 . Dhfiiiai  ;i 

18,74112 

[ UNI  HUS . mill . U INSOLE  . CON  1 ROl 

2,74140 

3.7430 

2.74175 

3.7450 

1 ,74133 

1, DM8801 

5,74174 

13.7100 

1 .DM86403 

9.7404 

ECACHE. MATA, RATH 

3.7410 

4.74157 

9.741  1 

5,7400 

3.7430 

4,7404 

1 A. 7174 

5,7410 

4.74 1 13 

6.7411 

4.741 10 

2,7420 

1.71157 

4 , 74 A 4 

5.74133 

6,7474 

1 .74193 

2.74140 

1 .DM088 1 

9,74158 

3.7413 

18 .DM8881 

3.74175 

1 2 » 71 1 75 

1, DM8610 3 

10.DM81301 

I AH  CRESS . MEMORY . BOARD 

6, DM8262 

1 » 7474 

11, DM8640 3 

2.74193 

C DAT A. MEMORY 

1.7400 

8,7404 

4, 7401 

10,74153 

1.7410 

72,93410 

3. 74  1 1 

4, DM8262 

1 . 7464 

3.7400 

9.74140 

18,7140 

14.71153 

5,74643 

lS.PHcont 

7. 74 1 75 

9 . DM8262 

2 » DI18A40 

A . 7405 

30 . 9341 0 J 
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I SYSTIM.ADDM  I A I !l 
1 . 7421 
I .713-’ 

1.7100 
, 7105 

а,  zitoi 

5.7100 

9 . 7101 

I , 7105 
7,71  to 

II, 7111 

3.7170 
A,  7110 

б, 7161 

3.7171 
3,71153 

7.71157 

3.71158 
2,71102 
24,31013 

l SYS . DESC/CNSL  . C.AB1  ES 

1 ,7474 

3.7404 

5 , DM8081 
1,3815 
7,7401 
2 , 7 13.1 

7,74153 

3.7400 

2.7400 

12.7404 

1.7410 

9.7411 
5,7420 

1,7440 
3 , 7164 
3,74158 
2,741  '1 
1,74155 

2,74157 
10,74133 

2,74157 
5,74713 


LS YSTLN . STA3  US . W.ti  f S 1 EK 

8.74153 
1,7442 
2,74 57 
1,7408 

6,7400 

9.7404 

4.7410 

8.7411 

3.7405 

5.7440 
0,7461 

3.74157 
8,74171 
4,74175 

1.74157 

8.7474 
4,36013 

C UN  IUtlS.  MAP 

7.74157 
3,74174 
12, DM8881 

3.74153 

2.7474 
5,71181 

9.7404 
1,74182 
1 ,7450 

4.7411 
3,7  100 
2,7430 
2 » 7420 
5, DM808 l 
10, DM0640 
2 f 7485 

1.7405 

3.7440 

1.74158 
1,74103 


■re  BBS”! 


r 


l f'i;vn:  1 1 itM . pr<rjn:s>-‘Of'f . i ow . opder 

CP  ROUT  II )il . PRC 

l • 0MS2  12 

2 . 104324  2 

8 . 4 1 53 

1,74151 

8-7U91 

3,74153 

2, 7400 

2,7135 

* 

3,74181. 

1 • 9 1 0 

5,7100 

3 .741 1 

2, 74*) 4 

3.7  4 1?:i3 

a , 7 105 

5,74157 

5,7110 

7 , 74158 

3,7411 

lit  7417  4 

3,7420 

3,71175 

3,7-1140 

17,7*1157 

17,74153 

3,71187 

1,7474 

7,74139 

1,746 1 

34*74143 

14-74174 

1 ,3c>01  J 

15, 71175 

CEP. ROM. CONTROL 

1,71157 

3.74151 

2,7451 

l - 7436 

1,71133 

1,74193 

3,74153 

9,7400 

6,74139 

3* 7404 

1,74182 

7,7110 

12,3601 3 

4.7411 

tftl.L.  FRACTION 

2,7420 

1 , DM3242 

9.7410 

7,74131 

3 , 7 1 A 4 

3,7400 

9,7474 

4,7404 

3 . •»  11 1 2 

2, 74 10 

7.71140 

3,7411 

3,7117  1 

1,7474 

1 • 74157 

5.74153 

4,74151 

17,74157 

5 • ~ 4 1 75 

7,74153 

5,7451 

12,74174 

t • 7442 

3,74157 

14. 3601 3 

3,74175 

1,7451 

1 , 7485 

7,7413° 

2,74143 

2,74132 

20,71113 

2,36011 

4CMFMl.il 
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rj  \n  n 


L 


l t.»|..hEHORY 

?193 

i * ;m:.' 
s • ? loo 

1.7  403 

-'•7110 

1.7120 

i * ’4140 

3.74153 

3*7474 

2.DNS640 

5 . DM8341 

1.74138 

1*7^133 


3.74128 

. 7402 


.7  437 
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ti  sin 

lspeciai  .1  uh  thins 

2F*DM0A41 
3F.7474 
1 .7412 
5F.7404 
IF *74003 

[BUS. ARBI  THAI  l Of! • I OGIC 
IF *7400 
IF  * DM8S37 
3F  * 7474 
lF.riM8.M13 

[ INTERRUP  I . e:CN  I ROl  . AND . RESET  .LOGIC 
4F.7404 
4F  * 74  74 
2F.DM8A41 
2F . 7400 
5F* DM8337 
IF. 7405 
IF. 741743 

[CLOCK . PULSE . GENERATOR 
IF. 7400 
IF* 7 A 140 
2F>  7474 
IF. 74139 
6F.7404 
4F.MH002A3 
[ROM. CHIPS 

3 .CPI  AM  PI 
[DATA. CHIP 

1 .CP161LH3 
[CONTROL. CHIP 
1 .CF' 162 II*  3 

[DUS . DR I OCRS . AND . REC I EVERS 

4 .742 57 
4 .11113641 
IF*  DM8641 
4F .741 1 
2F .74051 

[MEMORY 

1A.MK40VA3 


L DUS . 1 /O . CO. ! T ROl  .L001C 
IF. 7477 
7F.7400 
7F.7404 
2F.7411 
4F.7474 
5F.7410 
5F.DM8A41 
IF* DM00373 

[ 1 /O . PUS  . MEM.I.E  AD  . PA  I A . MUX 
4F.7475 
2F. 74257 
3F.7410 
3F* 7400 
2F. 74140 
2F.7405 
2F. 741073 

[FAST.  PIN.  Ml  IX 
IF. 74257 
IF. 7400 
IF. 7404 33 


l F . 8 . POWER . SMPPL Y .MICROPROCESSOR 

[PROCESSOR 

1*38703 

[A/D 

t .87033 

[OCTAL. LATCH 

1.93483 1 


C INTEL. 8030. BOARD 
1* 3080 
6 . 82t  2 
8.8101 
2.831  A3 
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This  appendix  contains  the  listings  of  the  type  and  approximate  number 
of  packages  necessary  to  implement  the  BIT  techniques  discussed  in  this 
report.  Along  with  this  information  is  the  failure  rate  ( FR ) and  the  maxi- 
mum power  consumption  for  each  of  the  integrated  circuits.  The  failure 
rate  data  was  obtained  from  the  reliability  analysis  computer  program  at 
Carnegie-Mel Ion  University.  A temperature  of  2b°C  was  used  so  as  to  get 
data  that  was  comparable  with  the  module  failure  rate  as  specified  in  the 
ITEK  documents.  The  maximum  power  consumption  data  was  generally  obtained 
from  manufacturer's  data  sheets.  After  the  information  on  the  hardware 
cost  estimates  is  the  information  on  the  software  costs. 
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CORADCOM-76-OlOO— F 


END 

DATE 

FILMED 

6 79 


MODULE : VP.AM32 

BIT  TECHNIQUE:  SINGLE  ERROR  CORRECTION  WITH  DOUBLE  ERROR  DETECTION 


Part  Name 

Part  Number 
(Typical ) 

Quantity 

Maximum 
Power 
Per  IC 
(mw) 

F.R.  per 

IC 

(/106  hours) 

16K  x 1 RAM 

MK16384 

12 

750 

2.49 

Parity  Gen/Check 

DM8262 

7 

300 

.236 

4-to-16  Line  Decoder 

54154 

2 

350 

.251 

Mul tiplexer 

54257 

3 

250 

.191 

Exciusive-OR 

5486 

4 

275 

.191 

Mi  sc.  Control 

54XX 

4 

200 

.165 

MODULE  TOTAL 

32 

14. 5w 

34.03* 

*With  no  considerations  for  fault  tolerance. 

Effective  Module  Failure  Rate  with  ECC  as  shown  above  is  53.9  /106 
hours. 


MODULE:  VRAM3 2 


BIT  TECHNIQUE  (ALTERNATE):  BYTE  PARITY 


Maximum 


Part  Name 

Part  Number 
(Typical ) 

Quantity 

Power 
Per  IC 
(mw) 

F.R.  per 

IC 

(/106  hours) 

16K  x 1 RAM 

MK16384 

4 

750 

2.49 

Parity  Gen/Check 

DM8262 

2 

300 

.236 

Mi  sc.  Control 

54XX 

2 

200 

.165 

MODULE  TOTAL 


8 4.Uw 


10.76 
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MODULE:  CPU3 


Part  Name 


Part  Number 
(Typical)  Quantity 


Maximum 

Power  F.R.  per 
Per  1C  1C 

(mw)  ( / 106  hours) 


MODULE:  CPU3  (Continued) 


Maximum 


Part  Name 

Part  Number 
(Typical ) 

Quanti ty 

Power 
Per  IC 
(mw) 

F.R.  per 

IC 

( / 10°  hours) 

FUNCTION:  MICROSEQUENCER  REGISTERS 

BIT  TECHNIQUE: 

PARITY 

Latch 

9308 

1 

500 

.335 

Parity  Gen/Check 

DM8262 

2 

450 

.236 

Mi  sc.  Control 

54XX 

1 

200 

.165 

FUNCTION  TOTAL 

4 

1.6w 

.97 

FUNCTION:  BUS 

DATA  AND  ADDRESS 

BIT  TECHNIQUE: 

PARITY 

Parity  Gen/Check 

DM6262 

4 

300 

.236 

Mi  sc.  Control 

54XX 

1 

200 

.165 

FUNCTION  TOTAL 

5 

1.4w 

1.11 

MODULE:  CPU3  (Continued) 


l 


Maximum 


Part  Name 

Part  Number 
(Typical ) 

Quanti ty 

Power 
Per  IC 
(mw) 

F.R.  per 
sIC 

(/10  hours) 

FUNCTION:  MICRODIAGNOSTIC  CONTROL  STORE 

BIT  TECHNIQUE: 

PARITY 

2Kx4  ROM 

TMS47UU 

5 

500 

.561 

FUNCTION  TOTAL 

5 

2.5w 

2.81 

FUNCTION:  TIMING 

BIT  TECHNIQUE: 

WATCHDOG  TIMER 

Counter  (4-bit) 

54192 

1 

500 

.373 

FUNCIION  TOTAL 

(3  Timers) 

3 

1.5w 

1.12 

FUNCTION:  ROM 

STORAGE 

BIT  TECHNIQUE: 

PARITY 

256x4  ROM 

54187 

1 

500 

.153 

Parity  Gen/Check 

DM8262 

4 

450 

.263 

Mi  sc.  Control 

54XX 

1 

200 

.135 

FUNCTION  TOTAL 

6 

2.5w 

1.26 
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Part  Name 


MODULE:  CPU3  (Continued) 


Part  Number 
(Typical)  Quantity 


FUNCTION:  OP-COOE 

BIT  TECHNIQUE:  ILLEGAL  OP-CODE  CHECK 


NAND  (8- input) 


5430 


FUNCTION  TOTAL 


Maximum 
Power 
Per  IC 


F.R.  per 
IC 


(mw)  (/luc  hours) 


50 


.066 


.lw 


.13 


MODULE  TOTAL 


51 


18. 8w 


14.16 


MODULE:  OEM 


Part  Name 


Part  Number 
(Typical)  Quantity 


Maximum 

Power  F.R.  per 
Per  IC  IC 

(mw)  ( / 106  hours) 


Part  Name 


MODULI : BIM2 


Part  Number 
(Typical)  Quantity 


Maximum 

Power  F.R.  per 
Per  IC  IC 

(mw)  (/1U* *•  hours) 


FUNCTION:  BUFFER 
BIT  TECHNIQUE:  PARITY 


Parity  Gen/Check 

DM8262 

4 

450 

.236 

Mi  sc.  Control 

54XX 

1 

200 

.165 

FUNCTION  TOTAL 

5 

2.0w 

1.11 

FUNCTION:  CONTROL 

BIT  TECHNIQUE:  WATCHDOG  TIMER 

Counter  (4-bit) 

54192 

1 

5U0 

.373 

FUNCTION  TOTAL  (2  Timers) 

FUNCTION:  UNI  BUS  CONTROL 
BIT  TECHNIQUE:  LOOP  AROUND 


Latches 
Mi  sc.  Control 


FUNCTION  TOTAL 


MODULE  TOTAL 


* • * ••  * - •-* ' 

*•  . r- 

• ■ V-  * s . • 

• ■*  V -“%■*  * „ v 


MODULE:  I/O  MODULE 


' 


Part  Name 


Part  Number 
(Typical)  Quantity 


FUNCTION:  BUFFER 
BIT  TECHNIQUE:  PARITY 


Parity  Gen/Check 
Misc.  Control 


FUNCTION  TOTAL 


FUNCTION:  CONTROL 

BIT  TECHNIQUE:  WATCHDOG  TIMER 


Counter  (4-bit) 


54192  1 


FUNCTION  TOTAL  (2  Timers) 


MODULE  TOTAL 


C-ll 


. « ' -a*  • ••  f*  ' 

• L f * «,  .• 

\r  * ,-%*«•  • M 


Maximum 
Power 
Per  IC 
(mw) 


500 


l.Ow 


3.0w 


F.R.  per 
IC 

(/106  hours) 


DM8262 

4 

450 

.236 

54XX 

1 

200 

.165 

5 

2.0w 

1.11 

.373 


.75 


1.86 


4 

I 


i 


I 


MODULE : PCM2 


Part  Name 

Part  Number 
(Typical ) 

Quantity 

Maximum 

Power 

Per  IC 
(mw) 

F.R.  per 

rIC 

( / 1 0 hours) 

BIT  TECHNIQUE: 

MICROPROCESSOR 

PLUS  A/D 

Microprocessor 

3870 

1 

750 

2.8 

A/0 

8703 

1 

250 

.5 

Latch 

74S373 

1 

500 

.2 

Sensors 

- 

1* 

- 

7 

Discretes 

- 

I* 

- 

3 

MODULE  TOTAL 

5 

1.5w 

13.5 

f 

► 


*Equivalent  on  the  basis  of  board  space. 


CHASSIS  MAINTENANCE  PANEL 


Part  Name 


Switches 

DPDT 

Alpha-Numeric 

Display 


Part  number 
(Typical)  Quantity 


Maximum 

Power  F.R.  per 
Per  IC  IC 

(mw)  ( /10G  hours) 


5 - .15 

4 4UU  .48 


TOTAL 


9 


1.6w 


2.67 


CHASSIS  LEVEL  SOFTWARE  COSTS 


MODULE 

NUMBER  OF 
INSTRUCTION 

EXECUTION 
TIME  [1] 

RAM 

Memory  Data  Test 

50 

350,000  cycles 

Memory  Controller  Test 

50 

50  cycles 

“ 

Memory  Address  Test 

50 

350,000  cycles 

TOTAL 

15U 

1.4  sec 

CPU 

- 

Register  Test 

50 

1,000  cycles 

Basic  Instruction  Test 

200 

250  cycles 

TOTAL 

250 

2.5  msec 

BEM 

- 

Parity  Check  Test 

50 

250 

Loop  Around  Test 

50 

250 

TOTAL 

100 

1.0  msec 

BIM2 


- Parity  Check  Test 

50 

250 

- loop  Around  Test 

50 

250 

TOTAL 

100 

1.0  msec 

I/O  MODULES 

- Parity  Check  Test  50  250 

- Loop  Around  Test  50  250 


TOTAL 


100 


1.0  msec 


NOTE: 

[I]  Execution  time  computed  by  multiplying  number  of  cycles  by  a typical 
microprocessor  cycle  time  of  2 microseconds. 
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